Skip to content

Add query_documents for advanced document querying#71

Open
sfkmk wants to merge 6 commits into
baruchiro:mainfrom
sfkmk:feature/query-documents-tool-call
Open

Add query_documents for advanced document querying#71
sfkmk wants to merge 6 commits into
baruchiro:mainfrom
sfkmk:feature/query-documents-tool-call

Conversation

@sfkmk

@sfkmk sfkmk commented Apr 4, 2026

Copy link
Copy Markdown

Summary

Expose Paperless' richer document query capabilities through MCP so agents can filter documents server-side instead of scanning broad result sets client-side.

  • add query_documents as the canonical document query tool
  • keep list_documents focused on simple listing
  • keep search_documents as a compatibility wrapper
  • support custom_field_query and validated paperless_filters
  • add tests and README examples
  • update .gitignore for local development files

Verification

  • npm test
  • npm run build

Summary by CodeRabbit

Release Notes

  • New Features

    • Added query_documents tool for advanced document querying with full-text search and structured Paperless-style filters (including custom field filtering).
  • Documentation

    • Expanded list_documents documentation with additional pagination, sorting, and filter options.
    • Marked search_documents as a deprecated compatibility wrapper; recommended using query_documents.
  • Refactor

    • Centralized document query execution shared across document tools.
  • Tests

    • Added coverage for query-string generation/serialization, validation, and error cases.
  • Chores

    • Updated ignore rules to exclude bun.lock and temp/.

Copilot AI review requested due to automatic review settings April 4, 2026 17:29
@changeset-bot

changeset-bot Bot commented Apr 4, 2026

Copy link
Copy Markdown

🦋 Changeset detected

Latest commit: 463351a

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package
Name Type
@baruchiro/paperless-mcp Minor

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@coderabbitai

coderabbitai Bot commented Apr 4, 2026

Copy link
Copy Markdown

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 117f8706-a8f5-43bd-be5c-d092052fd67e

📥 Commits

Reviewing files that changed from the base of the PR and between 907cdac and 463351a.

📒 Files selected for processing (1)
  • src/tools/documents.test.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/tools/documents.test.ts

📝 Walkthrough

Walkthrough

A new query_documents MCP tool is introduced alongside a new src/tools/utils/documentQuery.ts module that centralizes URL query-string construction. The module defines Zod schemas for custom_field_query (leaf clauses and recursive AND/OR groups), paperless_filters (record of scalar or array values), and three argument shape constants (LIST_DOCUMENTS_ARGS_SHAPE, QUERY_DOCUMENTS_ARGS_SHAPE, SEARCH_DOCUMENTS_ARGS_SHAPE). A buildDocumentQueryString function maps first-class args to Paperless parameter names, validates paperless_filters keys against an OpenAPI-derived allowlist, and JSON-encodes structured custom_field_query values. In documents.ts, a shared executeDocumentQuery helper is added and list_documents, query_documents, and search_documents (now a deprecated wrapper) all delegate through it. PaperlessAPI.searchDocuments is removed. Tests and README are updated accordingly.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

  • baruchiro/paperless-mcp#19: Renamed created__gte/created__lte to created__date__gte/created__date__lte in list_documents, the same parameter names now used in the new query-string builder.
  • baruchiro/paperless-mcp#118: Added the same filter parameters (archive_serial_number, archive_serial_number__isnull, custom_field_query, custom_fields__icontains) to the document query surface that this PR integrates into the unified builder.

Suggested reviewers

  • baruchiro
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 8.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Add query_documents for advanced document querying' accurately reflects the main change: introduction of a new query_documents tool for advanced document query functionality.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a new MCP tool (query_documents) to expose Paperless-NGX’s richer /api/documents/ query capabilities (full-text query, custom field query, and an allowlisted set of documented filters), while keeping list_documents focused on simple listing and retaining search_documents as a compatibility wrapper.

Changes:

  • Added a shared query builder/validator (buildDocumentQueryString, custom_field_query validation, and allowlisted paperless_filters).
  • Introduced query_documents and refactored list_documents / search_documents to share the same execution path.
  • Added test coverage for query serialization, validation, and OpenAPI allowlist sync; updated README usage examples and .gitignore.

Reviewed changes

Copilot reviewed 4 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/tools/utils/documentQuery.ts Implements validated query argument shapes and builds a safe/allowlisted query string for /api/documents/.
src/tools/documents.ts Refactors document retrieval to use a shared query executor and adds the query_documents tool.
src/tools/documents.test.ts Adds tests for query string building, custom field query validation, and allowlist sync with the OpenAPI spec.
README.md Documents query_documents as the canonical advanced query tool and updates list_documents / search_documents guidance.
.gitignore Adds ignores for local dev artifacts (bun.lock, temp/).

Comment thread src/tools/documents.test.ts Outdated
Comment thread src/tools/documents.ts
@sfkmk

sfkmk commented Apr 4, 2026

Copy link
Copy Markdown
Author

@baruchiro - I'm noting partial overlap with #70.

This PR is narrower in overall scope, but different in query design:

So I do not think this is a duplicate, but there is overlap in src/tools/documents.ts.

If #70 lands first, I can rebase this PR on top of it and adapt the implementation to the new annotations/test-helper shape. The main open question is API direction: extending list_documents vs keeping advanced Paperless querying in query_documents.

@baruchiro baruchiro added claude and removed claude labels May 18, 2026

@baruchiro baruchiro left a comment

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The overall structure is good: extracting buildDocumentQueryString into its own module is clean, the FIRST_CLASS_QUERY_PARAM_MAP approach is solid, and the OpenAPI-sync test is a clever guard. A few things need attention before merging.


Dead code

PaperlessAPI.searchDocuments is now unreachable.
After this PR the search_documents tool routes through executeDocumentQueryapi.getDocuments(), so api.searchDocuments() is never called. Remove it from PaperlessAPI.ts.

queryDocumentsArgsSchema is exported but unused.
documentQuery.ts exports queryDocumentsArgsSchema = z.object(QUERY_DOCUMENTS_ARGS_SHAPE) but nothing imports it. Either use it where the shape is consumed or drop the export.


Redundant tests

Per project convention, tests should check real input/output. Several tests here duplicate what the buildDocumentQueryString unit tests already cover, or assert on schema metadata rather than behaviour.

list_documents keeps existing simple query behavior — The parameter-to-query-string mapping is already exercised by serializes first-class list filters using Paperless parameter names. Going through the tool handler adds no new signal; the handler is a single return executeDocumentQuery(api, args) line.

search_documents remains a query-only compatibility wrapper — This asserts on Object.keys(schema) and a description regex. Neither is an I/O check. The actual query forwarding (one param, same key) is trivially covered by the serializer tests.

query_documents exposes advanced query fields and uses shared execution — The schema-key and description-regex assertions (assert.ok("custom_field_query" in ...), assert.match(...description, /custom field/i)) are registration tests, not behaviour tests. The query execution part overlaps with the serializer unit tests. Trim this down to the execution path, or remove it entirely if nothing distinct is being tested.

If custom_field_query JSON schema avoids tuple-style items arrays is also removed (it tests zod-to-json-schema's output format, an internal detail), the zod-to-json-schema dev dependency can be dropped along with it.


Minor

isCustomFieldQuery accepts group operators as leaf field names.
["AND", "iexact", "foo"] passes the leaf branch (length === 3, both strings, valid value) before the group branch is checked. In practice Paperless field names won't be "AND"/"OR", but adding an early exclusion makes the intent explicit:

// leaf branch
if (
  value.length === 3 &&
  typeof value[0] === "string" &&
  !CUSTOM_FIELD_QUERY_GROUP_OPERATORS.includes(value[0] as any) &&
  typeof value[1] === "string" &&
  isCustomFieldQueryValue(value[2])
) {  }

Generated by Claude Code

@sfkmk

sfkmk commented Jun 19, 2026

Copy link
Copy Markdown
Author

Updated after review.

Changes:

  • Merged current upstream/main and resolved conflicts while preserving the newer list_documents filters from feat: Add archive_serial_number and custom_fields filters to list_documents #118.
  • Removed the dead PaperlessAPI.searchDocuments method and the unused queryDocumentsArgsSchema export.
  • Removed redundant schema/handler tests and the direct zod-to-json-schema dev dependency.
  • Tightened custom_field_query validation so AND/OR are not accepted as leaf field names.
  • Kept query_documents as the structured advanced query tool while preserving raw-string custom_field_query compatibility in list_documents.

Verification:

  • npm ci
  • npm test
  • npm run build

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/tools/documents.test.ts`:
- Around line 26-31: In the getDocumentQueryParamsFromOpenApi function, add
explicit checks for the indexOf results before using them to slice the section.
Both the start index (from indexOf for "/api/documents/:") and end index (from
indexOf for "/api/documents/{id}/:") should be validated to ensure they are not
-1, which indicates the marker was not found. If either index is -1, throw an
error with a clear message that identifies which marker was missing, then
proceed with the text.slice operation only after both validations pass.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 704df819-27ad-4a7f-a813-ae0ff913b7c2

📥 Commits

Reviewing files that changed from the base of the PR and between 11e9d2d and 907cdac.

📒 Files selected for processing (5)
  • README.md
  • src/api/PaperlessAPI.ts
  • src/tools/documents.test.ts
  • src/tools/documents.ts
  • src/tools/utils/documentQuery.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/tools/documents.ts

Comment thread src/tools/documents.test.ts
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants