feat(connector): Notion database target#2049
Open
badmonster0 wants to merge 17 commits into
Open
Conversation
Adds cocoindex.connectors.notion — a declarative target connector for Notion databases (data sources in the 2025-09-03 API), mirroring the two-level pattern from connectors.postgres / connectors.sqlite. User declares a Python row class and calls declare_row(); CocoIndex keeps the Notion data source in sync — creating new pages, patching changed rows, and archiving pages whose source row falls out of the declared set. The archive-on-undeclare path is the main thing the hand-rolled HTTP plumbing in cocoindex-gtm couldn't do. Phase 1 scope: - managed_by="user" only — the data source must exist and be shared with the integration. Schema is validated against the live data source at mount; mismatches fail loudly instead of producing empty cells at write time. - 9 property types: title, rich_text, number, url, email, select, multi_select, date, checkbox. - Query-on-miss page_id resolution (one PK-filter call per cache miss, cached for the rest of the run). - 3 req/s rate limit + Retry-After honored + tenacity retry on 429. - OnDelete.ARCHIVE (default) / HARD / IGNORE. Out of scope (follow-ups): - managed_by="system" additive mode (auto-create the data source, PATCH new properties as the dataclass grows; destructive ops blocked unless allow_destructive=True). - RelationProp / PeopleProp / FilesProp. - Automated test suite (validated end-to-end by hand against a real Notion workspace; CI gating on NOTION_TEST_TOKEN is a follow-up). Includes: - python/cocoindex/connectors/notion/ — _client.py, _types.py, _target.py, __init__.py - examples/notion_target_basics/ — minimal demo with README - docs/src/content/docs/connectors/notion.mdx + sidebar entry - pyproject.toml: notion optional extra (aiohttp, tenacity) Design doc: https://www.notion.so/372daa511a0880fea8c2d9852b1d9f82 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Type all `dict` annotations with `dict[str, Any]` (strict mypy mode requires explicit type parameters for generics). - Add `tenacity` to the mypy missing-imports overrides — tenacity has no type stubs, so the @Retry decorator was tagged as untyped. - Annotate _provider on DatabaseTarget, fix Any-returning page_id and memo_key returns by tagging the local variable type. - Re-format with ruff to match CI version. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
managed_by="system" mode (additive)
-----------------------------------
- New kwargs on mount_database_target / database_target /
declare_database_target: managed_by, parent_page_id,
parent_database_id, title, allow_destructive.
- System mode looks under the given parent for a Notion database / data
source with the matching title; finds-or-creates on first run via
POST /v1/databases or POST /v1/data_sources, and PATCH-adds new
properties on subsequent runs when the dataclass grows.
- Destructive changes (existing property's type changed) are rejected
at mount unless allow_destructive=True. Type signatures are kept tight
with a new ManagedBy = Literal["user", "system"] alias.
- New per-PropType to_notion_schema() returns the schema body for create
/ PATCH calls. SelectProp / MultiSelectProp gained an optional
`options=("Foo", "Bar")` field for pre-declared select options.
- New DatabaseSchema methods: to_notion_properties() (the full body for
create) and diff_against() (additive-vs-destructive split, shared
between user-mode validation and system-mode evolution).
- _client.py: 3 new methods (create_database, create_data_source,
update_data_source_properties) and get_database. _request is now a
manual retry loop (no @Retry decorator) for clean typing.
Test suite
----------
- python/tests/connectors/test_notion_target.py: 8 cases.
- 3 schema-validation tests run without Notion access (typo in
property_map, two-titles check, managed_by-args validation).
- 5 integration tests gated by NOTION_TEST_TOKEN + NOTION_TEST_PARENT_PAGE:
insert/update/archive, on_delete=IGNORE behavior, schema-mismatch
type/missing checks, and system-mode create+evolve. Each integration
test creates its own temp data source and archives it in teardown.
Notes
-----
- pyproject.toml: notion extra no longer depends on tenacity (replaced
with an inline manual retry); tenacity removed from `all` and from
the mypy missing-imports overrides.
- Docs page (docs/src/content/docs/connectors/notion.mdx) updated with
a system-mode example and the full new mount signature.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The original lifecycle test used distinct App names per update call, which gives cocoindex no prior tracking record to reconcile against — so the archive step silently no-op'd. Reuse the same App name across all steps so the tracking carries forward. Same fix for test_on_delete_ignore_leaves_page. New tests (12 total now, all passing locally with NOTION_TEST_TOKEN + NOTION_TEST_PARENT_PAGE): - test_noop_when_no_changes: re-run with identical rows must not touch Notion (verified via last_edited_time on each page being unchanged). - test_on_delete_hard: OnDelete.HARD path actually removes the page from active queries. - test_property_types_roundtrip: title + rich_text + number + url + checkbox + select + date all encode -> Notion -> decode without corruption. - test_first_run_against_existing_page: if a page with the declared PK already exists in Notion (pre-seeded), the connector PATCHes it instead of POSTing a duplicate (exercises the query-on-miss happy hit path). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes a gap from the design doc's 14-case test plan: two mount_database_target calls in one app must sync independently. Catches the class of bug where per-target state (page_id cache, asyncio locks, tracking record identity) would accidentally be shared across targets. Verifies isolation at both insert (each target gets its own row) and undeclare (dropping rows from one target doesn't affect the other). Uses coco.use_mount with explicit component_subpath so each target gets a stable, distinct subpath across runs — same pattern as test_sqlite_target.test_multiple_tables. The other gap from the original 14-case plan, same-PK dedup, is left out by design: it's a cocoindex framework invariant (declare_target_state collapses same-StableKey calls before the connector sees anything), and neither test_sqlite_target.py nor test_postgres_target.py tests it for the same reason. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds RelationProp to the supported property type set so users with
linked Notion data sources (e.g. Signals -> Account + Developer in the
cocoindex-gtm CRM) can port to the new target.
Encoding takes a list of page IDs (or a single string); decoding returns
the list back. to_notion_schema() emits the minimal `{"relation": {}}`
when no target_database_id is provided (sufficient for user-managed
mode where the column already exists), or the full single_property body
when target_database_id is given (for managed_by="system" create).
Verified end-to-end via the cocoindex-gtm pipeline port:
- 20 GitHub signals processed; 6 wrote to Notion (the rest skipped
because the user has no resolved company — existing GTM rule).
- Each Notion Signal row has the right ID, Account relation, and
Developer relation.
- Re-run with one stargazer dropped via GTM_SKIP_USERS=badmonster0:
cocoindex reports `process_signal: 20 total | 19 reprocessed,
1 deleted` and the connector archives the orphaned Notion page.
Notion confirms 5 active rows (was 6) and 0 active badmonster0 rows.
This is the regression test for George's concern from the design doc:
declare a row, stop declaring it, assert it's archived. Until now the
hand-rolled notion_client.py could only upsert; the new target makes
the delete path automatic.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
georgeh0
reviewed
Jun 1, 2026
Match every other target connector (postgres, sqlite, qdrant, lancedb, …), which default managed_by to "system". The Notion connector was the lone outlier at "user". Make managed_by="user" explicit in the user-mode example, docs snippets, and tests that pass data_source_id positionally, since those now require user mode to be requested. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- Reorder ManagedBy Literal to ["system", "user"] (default first), matching the other connector docs/types. - Drop stale "in the follow-up" framing for system mode — it's the default and implemented now. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Restructure the basics example and docs walkthrough to teach the managed_by="system" path first (CocoIndex creates + evolves the "People" database under a parent page), with managed_by="user" demoted to a variant. Setup now uses NOTION_PARENT_PAGE instead of NOTION_DATA_SOURCE_ID. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds
cocoindex.connectors.notion— a declarative target for Notion databases (called data sources in the 2025-09-03 API), with the same upsert + automatic-delete reconcile semantics asconnectors.postgres.Two modes:
managed_by="user"(default) — point at an existing data source; the connector validates the live property schema matches at mount, then syncs rows.managed_by="system"— give the connector a parent page or database plus a title; it finds or creates the data source on first run, PATCH-adds new properties as the dataclass grows, and rejects destructive changes unlessallow_destructive=True.The archive-on-undeclare path is the main thing the hand-rolled HTTP code in cocoindex-gtm couldn't do — drop a row from the declared set, re-run, the matching Notion page gets archived.
Design doc (approved): https://www.notion.so/372daa511a0880fea8c2d9852b1d9f82
What's in this PR
Connector (
python/cocoindex/connectors/notion/):NotionClient— token-scoped, async, rate-limited (3 req/s semaphore),Retry-After-aware exponential backoff (inline manual retry — no tenacity dep).DatabaseSchema— binds a Python row class to Notion properties viaAnnotated[T, notion.SomeProp(...)](orproperty_map={...}). Validates against the live data source at mount;diff_against()splits additive from destructive changes.mount_database_target(client, data_source_id=None, schema, *, managed_by, parent_page_id, parent_database_id, title, on_delete, allow_destructive)— plus the lower-leveldatabase_targetanddeclare_database_target.TitleProp,RichTextProp,NumberProp,UrlProp,EmailProp,SelectProp,MultiSelectProp,DateProp,CheckboxProp. Each hasencode/decode/to_notion_schema()for create/PATCH bodies.OnDelete.ARCHIVE(default) /HARD/IGNORE.Test suite (
python/tests/connectors/test_notion_target.py, 12 cases — all passing locally against a real Notion workspace):test_property_map_typo_raisestest_schema_requires_at_most_one_titletest_managed_by_args_validationNOTION_TEST_TOKEN+NOTION_TEST_PARENT_PAGE:test_insert_update_archive— full lifecycle: insert 3, update 1, drop 1 → drops archivedtest_on_delete_ignore_leaves_page—OnDelete.IGNOREdoesn't touch the page on undeclaretest_on_delete_hard—OnDelete.HARDactually removes from active queriestest_noop_when_no_changes— re-run with identical data → zero PATCHes (verified vialast_edited_time)test_property_types_roundtrip— title + rich_text + number + url + checkbox + select + date all round-trip cleanlytest_first_run_against_existing_page— pre-seeded page with declared PK gets PATCHed, not duplicated (query-on-miss happy path)test_schema_validation_type_mismatch— wrong type at mount →ValueError, zero writestest_schema_validation_missing_property— missing column at mount →ValueError, zero writestest_system_creates_and_evolves— system mode creates the data source on first run, PATCH-adds a new property on the second runSuite runs in ~42s end-to-end. Each integration test creates its own temp data source and archives it in teardown.
Docs:
docs/src/content/docs/connectors/notion.mdxcovers connection setup, both modes, all property types, delete strategies, page-id persistence, and the four Notion-API setup gotchas (integration sharing, parent access, internal vs public integrations, API version pinning). Sidebar entry added.Example:
examples/notion_target_basics/— runnable demo withPersonrows + README showing the insert / no-op / archive lifecycle.Packaging:
pyproject.tomlgains anotionoptional extra (aiohttponly).Deferred to follow-up PRs
RelationProp/PeopleProp/FilesProp— these prop types aren't enumerated in the design doc and relations specifically need cross-target ordering, which is its own design problem. Out of scope here, happy to addRelationPropif you want.CI status
fast-check(ruff format/lint, end-of-file fixers, etc.): passe2e-type-check(strict mypy, Python 3.11–3.14): pass on all four versionsbuild-test(Rust compile + pytest across Linux/macOS/Windows): in progress — Linux 3.11, Linux 3.14, macOS 3.11 already passing.Test plan
pytest python/tests/connectors/test_notion_target.py, ~42s).from cocoindex.connectors import notion).examples/notion_target_basicsdemo also still works.🤖 Generated with Claude Code