Skip to content

fix(gatsby-source-strapi): nested relation cache invalidation#524

Closed
molund wants to merge 2 commits into
gatsby-uc:mainfrom
molund:fix-nested-strapi-caching
Closed

fix(gatsby-source-strapi): nested relation cache invalidation#524
molund wants to merge 2 commits into
gatsby-uc:mainfrom
molund:fix-nested-strapi-caching

Conversation

@molund
Copy link
Copy Markdown

@molund molund commented Mar 10, 2026

Description

Gatsby's cache invalidation for gatsby-source-strapi only considered the top-level updatedAt/publishedAt fields on a Strapi entry. When a nested relation was updated in Strapi, the parent entry's top-level timestamps remained unchanged — causing Gatsby to treat the entry as unmodified and serve stale data.

Solution

Added a findLatestDates utility to clean-data.js that recursively traverses the entire cleaned entry — including all nested objects and arrays — and returns the maximum updatedAt and publishedAt values found anywhere in the document tree.

cleanData now overwrites the top-level updatedAt and publishedAt with these maximums before returning, ensuring that any change to any relation at any depth of nesting will correctly bust the cache.

Compatible with both Strapi v4 and v5 because findLatestDates operates on the output of cleanAttributes, not on the raw API response.

Related Issues

Fixes #523

@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented Mar 10, 2026

⚠️ No Changeset found

Latest commit: bdb930d

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@moonmeister
Copy link
Copy Markdown
Contributor

Thanks for the PR! I'll take a look next week likely or maybe @laurenskling will have time as he's the resident Strapi expert.

Copy link
Copy Markdown
Contributor

@laurenskling laurenskling left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting.. I have this usecase myself that when a user adds a piece of content and relates it to Page, gatsby doesn't update that Page because it did not receive an update itself. I guess this PR is trying to fix that?


// recursive function to traverse the entire data object
function traverse(node) {
if (Array.isArray(node)) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking out loud here.. we assume an array is always relations? What else do we have in strapi that could be arrays... images? I guess we would want this on image dates as well.. something else?

Copy link
Copy Markdown
Author

@molund molund Mar 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this change, our goal is strictly to derive the maximum updatedAt and publishedAt values from nested relations. The traversal logic will go into other content structures (components, dynamic zones, media, JSON), but we’ll only collect dates from nested objects with updatedAt / publishedAt fields. If other nested objects happen to use the same updatedAt / publishedAt naming convention, that would simply be an added bonus.

We don't really use the Strapi media library to its full ability on bcparks.ca so I'm not overly familiar with what the image responses might look like. But it won't break the generic traversal logic.

return {
...cleanAttributes(getAttributes(data, version), currentContentTypeSchema, schemas, version),
...cleaned,
updatedAt: latest.updatedAt || cleaned.updatedAt || undefined,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cleaned.updatedAt is redundant here, right? As it will always be the latest, because it went thru findLatestDates, right?

Copy link
Copy Markdown
Author

@molund molund Mar 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I make the change suggested above here https://github.com/gatsby-uc/plugins/pull/524/changes#r2935638874 then this comment can be resolved.

version,
);
const latest = findLatestDates(cleaned);
return {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we actually mutate the updatedAt and publishedAt here? Maybe someone is sorting on one of these fields, I'm not sure if we want them to be changed. Isn't the goal achieved be adding a new value? Like adding changedAt with a result from findLatestDates (now we would need only one?) will still shake the caching right?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m not familiar with the internals of Gatsby’s caching, but if changedAt would also invalidate the digest, then it’s probably the less destructive option. With Strapi 5, publishedAt has become much more important, compared to Strapi 4 where it was less significant. A combined field should therefore take the maximum value of both.
I would probably prefer something more generic like strapi_timestamp, to align with strapi_id and strapi_component, which already exist in the Gatsby GraphQL schema. Prefixing will prevent name collisions with user‑defined fields.
If you’re in agreement with strapi_timestamp, I’ll push a change.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to be sure that this will actually trigger Gatsby's caching. As I'm not sure either. Maybe @moonmeister is more skilled in this area?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

strapi_timestamp sounds good to me

@molund
Copy link
Copy Markdown
Author

molund commented Mar 14, 2026

Interesting.. I have this usecase myself that when a user adds a piece of content and relates it to Page, gatsby doesn't update that Page because it did not receive an update itself. I guess this PR is trying to fix that?

Exactly. bcparks.ca has a very deeply nested populate structure on the protected-area collection. This is where we are seeing the issue.
https://github.com/bcgov/bcparks.ca/blob/main/src/gatsby/gatsby-config.js
https://bcparks.ca/find-a-park/

@moonmeister
Copy link
Copy Markdown
Contributor

Possibly ignorant question here, is there a reason we don't treat these items as their own nodes instead of nesting within the parent?

@molund
Copy link
Copy Markdown
Author

molund commented Mar 15, 2026

Possibly ignorant question here, is there a reason we don't treat these items as their own nodes instead of nesting within the parent?

We could treat them as their own nodes (which the plugin already supports) but then we’d need to stitch them back together at render time. The deep queries populate approach avoids that by treating the parent as a materialized “view” and asking the Strapi API to populate the relations, so we get one already‑joined payload per record and generate one top‑level node per Gatsby page.

@laurenskling
Copy link
Copy Markdown
Contributor

Possibly ignorant question here, is there a reason we don't treat these items as their own nodes instead of nesting within the parent?

We could treat them as their own nodes (which the plugin already supports) but then we’d need to stitch them back together at render time. The deep queries populate approach avoids that by treating the parent as a materialized “view” and asking the Strapi API to populate the relations, so we get one already‑joined payload per record and generate one top‑level node per Gatsby page.

This is not really true. https://github.com/gatsby-uc/plugins/blob/main/packages/gatsby-source-strapi/src/normalize.js#L216 relations are always turned into nodes. You can top-level query your relations now in Gatsby. It's irrelevant that you've used the deep query populate (which is also the only way to get relations..). They are stitched back together, with ${attributeName}___NODE.

but as your suggested change is at the clean-data stage, it's when we fetch, before we create nodes, so this field will be added.

It will only work if you populate the updatedAt field tho, so it won't do anything in my own codebase right now, as I only populate relation id's (I do this because merging a queried populate and a top-level fetch of the same contentType can conflict. So I only top-level query all fields and I only populate relation ids, which will point to the full fetched relation). So actually fetching updatedAt is required to make this work per contentType, as it won't look into other fetched data

@molund
Copy link
Copy Markdown
Author

molund commented Apr 2, 2026

I'm going to close this PR because I discovered that my fix doesn't actually populate the nested relations.

I tried to fix the code that populates the nested relations in normalize.js, but I haven't been able to get it working without errors and warnings that I have been unable to resolve.

The code for populating relations is insert-only with no working upsert behaviour

        if (Array.isArray(value)) {
          const mediaNodes = value.map((relation) => prepareMediaNode(relation, config));
          entity[`${attributeName}___NODE`] = mediaNodes.map(({ id }) => id);

          for (const node of mediaNodes) {
            if (!getNode(node.id)) {
              nodes.push(node);
            }
          }
        } else {
          const mediaNode = prepareMediaNode(value, config);

          entity[`${attributeName}___NODE`] = mediaNode.id;

          const relationNodeToCreate = getNode(mediaNode.id);

          if (!relationNodeToCreate) {
            nodes.push(mediaNode);
          }
        }

When I try to implement upsert behaviour there are minor schema differences that can't be resolved. It might be an issue with the timing of the schema auto-discovery but I'm not sure.

@molund molund closed this Apr 2, 2026
@laurenskling
Copy link
Copy Markdown
Contributor

@molund if I read between the lines about the issue you are experiencing, is it that nested queried data doesn't update at all, right?

What you want to do is only fetch the id for a relation and query that related content type itself from the root.

The graphql schema will link these two together, thus making it possible to use nested data in your graphql queries. You don't need to do that at fetching time. It's even better to not do it, as it will result into the same Node being merged together, possibly mismatching and losing data (like requesting less fields on a second or third fetch query).

With this, updates will update that root content and nested graphql qeuries will receive that as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug(gatsby-source-strapi): cache is not invalidated when nested relations have newer updatedAt/publishedAt

3 participants