Skip to content

Document References Extraction System#1025

Open
busbyk wants to merge 16 commits intorefactor/esm-test-supportfrom
revalidation
Open

Document References Extraction System#1025
busbyk wants to merge 16 commits intorefactor/esm-test-supportfrom
revalidation

Conversation

@busbyk
Copy link
Copy Markdown
Collaborator

@busbyk busbyk commented Apr 6, 2026

Description

The foundation for unified document reference revalidation — a single documentReferences field on all routable collections that replaces the two parallel revalidation subsystems (block reference tracking in richText type fields + relationship reference tracking). This PR adds the field, the extraction system, and a backfill migration. The old system remains in place and continues to drive revalidation; the new system runs alongside it temporarily for comparison.

This is the first phase. #1026 implements the revalidation logic based on querying documentReferences and removes the old system.

Related Issues

First step for #455.

Key Changes

  • extractDocumentReferences — Core extraction system that recursively walks any document and extracts all relationship/upload references at any nesting depth (relationship, upload, blocks, richText/Lexical, group, array, row, collapsible, tabs).
  • Thorough test suite that uses a combination of synthetic field configs/data and fixtures modeled on production data.
  • documentReferences field — Added to all 4 routable collections (pages, posts, homePages, events). Hidden in admin UI.
  • populateDocumentReferences hook — Generic beforeChange hook that calls extractDocumentReferences and populates the field on every save (including drafts).
    hooks.
  • Backfill migration — Re-saves all published routable documents to trigger the hook and populate documentReferences for existing data. Drafts are not backfilled — they'll be populated naturally on their next save or not at all since they don't appear on pages (we only display published content).
  • Tenant relationship filtering — extractDocumentReferences now skips tenants references (both non-polymorphic and polymorphic), consistent with getRelationshipsFromConfig. Every tenant-scoped document has a tenant ref, but these aren't useful for revalidation tracking.

How to test

  1. Run pnpm test
  2. Start dev server, open a page/post in the admin, save it with deeply nested relationships — documentReferences is populated
  3. To view the field value open the API tab

Screenshots / Demo video

https://www.loom.com/share/1e190bb2960445adb5f80492a070321a

Migration Explanation

20260404_012604_add_documentReferences_field - Adds the documentReferences field to routable collections.

20260404_015415_backfill_document_references — Data-only migration (no schema change). Iterates all published documents in pages, posts, homePages, and events, performing a no-op update (data: {}) on each. This triggers the populateDocumentReferences beforeChange hook, populating the new documentReferences field. Uses context.disableRevalidate to prevent cascading revalidation during the backfill. Draft documents are not backfilled.

Future enhancements / Questions

The next PR will:

  1. Create unified findDocumentsWithReferences query function — Single function that queries documentReferences across all routable collections, replacing findDocumentsWithBlockReferences + findDocumentsWithRelationshipReferences
  2. Create unified revalidateDocumentReferences function — Orchestration function that calls the query function then revalidates each result
  3. Update reference collection hooks — Switch all 7 hooks from old dual-call pattern to single revalidateDocumentReferences call
  4. Remove old system — Delete old fields (blocksInContent, blocksInHighlightedContent), hooks, and utilities; generate migration to drop fields
  5. Increase revalidation intervals — Once validated, increase revalidate from 600s to 3600s+

Restricting lookups to configured blocks

When walking a richText field's Lexical AST, the function builds a Map<slug, Block> from the BlocksFeature's blocks array (i.e. the allowed blocks). i.e.:

lexicalEditor({
  features: ({ rootFeatures }) => [
    ...rootFeatures,
    BlocksFeature({ blocks: [MediaBlock, CalloutBlock, ButtonBlock] }),
  ]
})

For each type: 'block' node in the AST, it looks up fields.blockType in the map. If the blockType isn't in the allowed blocks, it's skipped (the "unknown blockType" edge case tests). If it IS found, the Block's fields schema is used to recursively extract references from that block's data.

This feels like the correct decision but I felt like it was worth noting in case someone sees an issue with this logic.

@busbyk busbyk self-assigned this Apr 6, 2026
@busbyk busbyk changed the base branch from refactor/esm-test-support to main April 6, 2026 18:08
@busbyk busbyk changed the base branch from main to refactor/esm-test-support April 6, 2026 18:09
busbyk and others added 13 commits April 6, 2026 11:09
Tests for the planned extractDocumentReferences function that will walk a
document's content tree and extract all relationship/upload references into
a flat array for the unified documentReferences revalidation field.

Uses real block configs and collection configs from the codebase (not
synthetic mocks) so tests break when configs change. The BlocksFeature mock
captures its blocks arg into serverFeatureProps.blocks, allowing the
extraction function to derive block mappings from config introspection
rather than needing them passed in.

Includes JSON fixtures modeled on real dev database content structures:
- page-about-us-layout: deep nesting (ContentBlock > CalloutBlock > ButtonBlock)
- page-who-we-are-layout: direct TeamBlocks in layout
- page-supporters-layout: hasMany SponsorsBlocks with 47 sponsor refs
- post-with-media-block: post with featuredImage + MediaBlock in richText

Tests are skipped (describe.skip) until implementation exists.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add the core extraction function that recursively walks a Payload
document's field config tree and data to find all relationship/upload
references at any nesting depth. This handles the deep nesting gap
where blocks inside richText inside blocks were previously invisible
to the revalidation system.

Handles: relationship, upload, blocks, richText (Lexical), group,
array, row, collapsible, and tabs fields. Deduplicates on
collection + docId.

Also updates the test suite: removes placeholder stub, imports real
implementation, unskips tests, fixes Events unnamed group test data
and post fixture dedup assertion. All 41 tests passing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ocumentReferences

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…onship queries

eventsBlockMappings was using postsBlocks instead of eventsBlocks, and the events
collection was completely omitted from relationship reference tracking and queries.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…outable collections

Create a reusable documentReferencesField (array with collection, docId, blockType,
fieldPath sub-fields) and a generic populateDocumentReferences beforeChange hook that
calls extractDocumentReferences to walk the full document tree. Wire both into all 4
routable collections: Pages, Posts, HomePages, Events.

The field is hidden by default — only visible to super admins who opt in via
localStorage.setItem('showDocRefs', '1').

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace the conditional visibility logic (super admin + localStorage check)
with simple disabled: true to match the blocksInContent field pattern.
The field data remains accessible via API responses.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…cuments

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 6, 2026

Preview deployment: https://revalidation.preview.avy-fx.org

* so it handles all field types including richText with BlocksFeature at any
* nesting depth.
*/
export const populateDocumentReferences: CollectionBeforeChangeHook = ({ data, collection }) => {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible for there to be a cycle in the references, and, if so, what happens?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe it's possible for this to cycle. This is a before change hook so we can modify data here and that does not re-trigger hooks.

Cycling is probably more of a concern in the next PR #1026 but that's also not re-triggering hooks because the revalidation system is only calling Next.js' revalidatePath which does not change any data in the db or interact with the Payload Local API or Rest API besides find queries.

Good thought though and we should ensure this isn't possible. So I added a test for that in #1026: ce5e8d8 (that PR is still in draft atm fyi so hold off on full review until I mark ready).

@busbyk busbyk changed the title Revalidation Document References Extraction System Apr 6, 2026
busbyk and others added 2 commits April 6, 2026 15:46
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@busbyk busbyk marked this pull request as ready for review April 7, 2026 15:41
@busbyk busbyk mentioned this pull request Apr 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants