Document References Extraction System by busbyk · Pull Request #1025 · NWACus/web

busbyk · 2026-04-06T18:06:32Z

Description

The foundation for unified document reference revalidation — a single documentReferences field on all routable collections that replaces the two parallel revalidation subsystems (block reference tracking in richText type fields + relationship reference tracking). This PR adds the field, the extraction system, and a backfill migration. The old system remains in place and continues to drive revalidation; the new system runs alongside it temporarily for comparison.

This is the first phase. #1026 implements the revalidation logic based on querying documentReferences and removes the old system.

Related Issues

First step for #455.

Key Changes

extractDocumentReferences — Core extraction system that recursively walks any document and extracts all relationship/upload references at any nesting depth (relationship, upload, blocks, richText/Lexical, group, array, row, collapsible, tabs).
Thorough test suite that uses a combination of synthetic field configs/data and fixtures modeled on production data.
documentReferences field — Added to all 4 routable collections (pages, posts, homePages, events). Hidden in admin UI.
populateDocumentReferences hook — Generic beforeChange hook that calls extractDocumentReferences and populates the field on every save (including drafts).
hooks.
Backfill migration — Re-saves all published routable documents to trigger the hook and populate documentReferences for existing data. Drafts are not backfilled — they'll be populated naturally on their next save or not at all since they don't appear on pages (we only display published content).
Tenant relationship filtering — extractDocumentReferences now skips tenants references (both non-polymorphic and polymorphic), consistent with getRelationshipsFromConfig. Every tenant-scoped document has a tenant ref, but these aren't useful for revalidation tracking.

How to test

Run pnpm test
Start dev server, open a page/post in the admin, save it with deeply nested relationships — documentReferences is populated
To view the field value open the API tab

Screenshots / Demo video

https://www.loom.com/share/1e190bb2960445adb5f80492a070321a

Migration Explanation

20260404_012604_add_documentReferences_field - Adds the documentReferences field to routable collections.

20260404_015415_backfill_document_references — Data-only migration (no schema change). Iterates all published documents in pages, posts, homePages, and events, performing a no-op update (data: {}) on each. This triggers the populateDocumentReferences beforeChange hook, populating the new documentReferences field. Uses context.disableRevalidate to prevent cascading revalidation during the backfill. Draft documents are not backfilled.

Future enhancements / Questions

The next PR will:

Create unified findDocumentsWithReferences query function — Single function that queries documentReferences across all routable collections, replacing findDocumentsWithBlockReferences + findDocumentsWithRelationshipReferences
Create unified revalidateDocumentReferences function — Orchestration function that calls the query function then revalidates each result
Update reference collection hooks — Switch all 7 hooks from old dual-call pattern to single revalidateDocumentReferences call
Remove old system — Delete old fields (blocksInContent, blocksInHighlightedContent), hooks, and utilities; generate migration to drop fields
Increase revalidation intervals — Once validated, increase revalidate from 600s to 3600s+

Restricting lookups to configured blocks

When walking a richText field's Lexical AST, the function builds a Map<slug, Block> from the BlocksFeature's blocks array (i.e. the allowed blocks). i.e.:

lexicalEditor({
  features: ({ rootFeatures }) => [
    ...rootFeatures,
    BlocksFeature({ blocks: [MediaBlock, CalloutBlock, ButtonBlock] }),
  ]
})

For each type: 'block' node in the AST, it looks up fields.blockType in the map. If the blockType isn't in the allowed blocks, it's skipped (the "unknown blockType" edge case tests). If it IS found, the Block's fields schema is used to recursively extract references from that block's data.

This feels like the correct decision but I felt like it was worth noting in case someone sees an issue with this logic.

Tests for the planned extractDocumentReferences function that will walk a document's content tree and extract all relationship/upload references into a flat array for the unified documentReferences revalidation field. Uses real block configs and collection configs from the codebase (not synthetic mocks) so tests break when configs change. The BlocksFeature mock captures its blocks arg into serverFeatureProps.blocks, allowing the extraction function to derive block mappings from config introspection rather than needing them passed in. Includes JSON fixtures modeled on real dev database content structures: - page-about-us-layout: deep nesting (ContentBlock > CalloutBlock > ButtonBlock) - page-who-we-are-layout: direct TeamBlocks in layout - page-supporters-layout: hasMany SponsorsBlocks with 47 sponsor refs - post-with-media-block: post with featuredImage + MediaBlock in richText Tests are skipped (describe.skip) until implementation exists. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add the core extraction function that recursively walks a Payload document's field config tree and data to find all relationship/upload references at any nesting depth. This handles the deep nesting gap where blocks inside richText inside blocks were previously invisible to the revalidation system. Handles: relationship, upload, blocks, richText (Lexical), group, array, row, collapsible, and tabs fields. Deduplicates on collection + docId. Also updates the test suite: removes placeholder stub, imports real implementation, unskips tests, fixes Events unnamed group test data and post fixture dedup assertion. All 41 tests passing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ocumentReferences Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…onship queries eventsBlockMappings was using postsBlocks instead of eventsBlocks, and the events collection was completely omitted from relationship reference tracking and queries. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…outable collections Create a reusable documentReferencesField (array with collection, docId, blockType, fieldPath sub-fields) and a generic populateDocumentReferences beforeChange hook that calls extractDocumentReferences to walk the full document tree. Wire both into all 4 routable collections: Pages, Posts, HomePages, Events. The field is hidden by default — only visible to super admins who opt in via localStorage.setItem('showDocRefs', '1'). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Replace the conditional visibility logic (super admin + localStorage check) with simple disabled: true to match the blocksInContent field pattern. The field data remains accessible via API responses. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…cuments Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…omments

github-actions · 2026-04-06T18:26:53Z

Preview deployment: https://revalidation.preview.avy-fx.org

stevekuznetsov · 2026-04-06T18:27:24Z

src/hooks/populateDocumentReferences.ts

+ * so it handles all field types including richText with BlocksFeature at any
+ * nesting depth.
+ */
+export const populateDocumentReferences: CollectionBeforeChangeHook = ({ data, collection }) => {


Is it possible for there to be a cycle in the references, and, if so, what happens?

I don't believe it's possible for this to cycle. This is a before change hook so we can modify data here and that does not re-trigger hooks.

Cycling is probably more of a concern in the next PR #1026 but that's also not re-triggering hooks because the revalidation system is only calling Next.js' revalidatePath which does not change any data in the db or interact with the Payload Local API or Rest API besides find queries.

Good thought though and we should ensure this isn't possible. So I added a test for that in #1026: ce5e8d8 (that PR is still in draft atm fyi so hold off on full review until I mark ready).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

busbyk self-assigned this Apr 6, 2026

busbyk had a problem deploying to Preview April 6, 2026 18:06 — with GitHub Actions Failure

busbyk changed the base branch from refactor/esm-test-support to main April 6, 2026 18:08

busbyk changed the base branch from main to refactor/esm-test-support April 6, 2026 18:09

busbyk and others added 13 commits April 6, 2026 11:09

Fix lint errors: replace type assertions with type guards in extractD…

2f78897

…ocumentReferences Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add comparison logging between old and new revalidation systems

c250693

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add migration to backfill documentReferences for existing routable do…

7236b17

…cuments Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add migration index and snapshot for backfill_document_references

e96f831

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

adding tests for hasMany polymorphic relationship fields + updating c…

678f78f

…omments

removing unnecessary ternary

7464772

removing custom field access + updating function comments

e071f53

removing unnecessary comparison system

75fc68a

busbyk force-pushed the revalidation branch from 7c46f23 to 75fc68a Compare April 6, 2026 18:10

busbyk temporarily deployed to Preview April 6, 2026 18:10 — with GitHub Actions Inactive

stevekuznetsov reviewed Apr 6, 2026

View reviewed changes

busbyk changed the title ~~Revalidation~~ Document References Extraction System Apr 6, 2026

busbyk and others added 2 commits April 6, 2026 15:46

filtering tenant relationship fields from extractDocumentReferences

416961f

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

removing unnecessary test comments

4ea4f3a

busbyk temporarily deployed to Preview April 6, 2026 22:48 — with GitHub Actions Inactive

support inline blocks in extractDocumentReferences

fca1b2d

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

busbyk temporarily deployed to Preview April 6, 2026 23:57 — with GitHub Actions Inactive

busbyk marked this pull request as ready for review April 7, 2026 15:41

busbyk requested review from rchlfryn and stevekuznetsov April 7, 2026 15:41

busbyk mentioned this pull request Apr 9, 2026

Unified Revalidation #1026

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Document References Extraction System#1025

Document References Extraction System#1025
busbyk wants to merge 16 commits intorefactor/esm-test-supportfrom
revalidation

busbyk commented Apr 6, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Apr 6, 2026

Uh oh!

stevekuznetsov Apr 6, 2026

Uh oh!

busbyk Apr 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

busbyk commented Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related Issues

Key Changes

How to test

Screenshots / Demo video

Migration Explanation

Future enhancements / Questions

The next PR will:

Restricting lookups to configured blocks

Uh oh!

github-actions bot commented Apr 6, 2026

Uh oh!

stevekuznetsov Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

busbyk Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

busbyk commented Apr 6, 2026 •

edited

Loading