Skip to content

include the pages activeSdk in algolia search results#3409

Open
NWylynko wants to merge 8 commits into
mainfrom
nick/include-guides-sdk-in-search
Open

include the pages activeSdk in algolia search results#3409
NWylynko wants to merge 8 commits into
mainfrom
nick/include-guides-sdk-in-search

Conversation

@NWylynko
Copy link
Copy Markdown
Contributor

@NWylynko NWylynko commented May 29, 2026

🔎 Previews:

What does this solve? What changed?

  • Adds the guides active sdk to the algolia search results, allowing search to boost search results that match the users active sdk

Deadline

Other resources

@NWylynko NWylynko requested a review from a team as a code owner May 29, 2026 21:56
@vercel
Copy link
Copy Markdown

vercel Bot commented May 29, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
clerk-docs Ready Ready Preview Jun 2, 2026 7:55pm

Request Review

@NWylynko NWylynko requested a review from manovotny May 29, 2026 22:04
manovotny and others added 3 commits May 30, 2026 16:22
Additively register branch/record_batch/sdk for faceting (required by the active-SDK optionalFilters boost and the stale-record cleanup; self-sufficient on a fresh index). Enforce a ranking order with attribute/exact above proximity so title/heading matches outrank body-content matches — the fix for buried guides/quickstarts. The indexer is now the source of truth for these settings.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…+ curated)

Acronym⇄expansion pairs are auto-derived from the _tooltips glossary (self-syncing as tooltips are added); a curated list covers product-rename/phrasing synonyms (magic link→email link, login→sign in, i18n→localization, etc.). Enforced every run via saveSynonyms(replaceExistingSynonyms), same model as ranking. Fixes broken queries: 'magic link'/'i18n'/'DKIM' went from garbage/zero results to relevant.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Note that the indexer (scripts/update-algolia-records.ts) is the source of truth for the search index's faceting, ranking, and synonyms, and that tuning them in the Algolia dashboard gets reverted on the next index run.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@manovotny
Copy link
Copy Markdown
Contributor

Beyond the per-variant sdk field, this branch now also codifies the docs index's faceting, ranking, and synonyms in scripts/update-algolia-records.ts, so they're enforced on every index run instead of living as mutable dashboard state.

What's in this PR specifically:

  • sdk added to attributesForFaceting (filterOnly) — what makes the optionalFilters: sdk:<active> boost on the client side work at all.
  • Ranking reorder enforced in the indexer — attribute/exact above proximity, so title/heading matches beat body-content matches.
  • Hybrid synonyms — acronyms auto-derived from the _tooltips glossary + a curated phrasing list (fixes magic link, i18n, DKIM, login, …).
  • AGENTS.md: don't tune ranking/synonyms/faceting in the Algolia dashboard — they're codified here and revert on every reindex.

Paired with the client-side active-SDK boost in clerk/clerk#2661. The full data-backed writeup — why boost sdk over availableSDKs, the ranking reorder, the synonym set, and every rejected alternative with measurements — lives there: https://github.com/clerk/clerk/pull/2661#issuecomment-4585030268

Comment thread scripts/update-algolia-records.ts Outdated
Comment on lines +866 to +875
await algolia.setSettings({
indexName: ALGOLIA_INDEX_NAME,
indexSettings: {
attributesForFaceting: [
...attributesForFaceting,
...missingFacets.map((attribute) => `filterOnly(${attribute})`),
],
ranking,
},
})
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
await algolia.setSettings({
indexName: ALGOLIA_INDEX_NAME,
indexSettings: {
attributesForFaceting: [
...attributesForFaceting,
...missingFacets.map((attribute) => `filterOnly(${attribute})`),
],
ranking,
},
})
await algolia.setSettings({
indexName: ALGOLIA_INDEX_NAME,
indexSettings: {
attributesForFaceting: [
...attributesForFaceting,
...missingFacets.map((attribute) => `filterOnly(${attribute})`),
],
ranking,
},
forwardToReplicas: true,
})
Image

@manovotny should the setSettings also forward the change to replicas. Granted I don't think we are using replicas, but should be consistent I think.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good Q — went back and forth on this, landed on: no, setSettings shouldn't forward.

We're codifying these settings (declare + overwrite every run), which is itself the mechanism for keeping indexes consistent. forwardToReplicas is the other model — "tune the primary, propagate it" — so under codification it's redundant: the script configures whatever index it targets directly.

It's also the riskier default here specifically because this setSettings bundles ranking/customRanking. A standard replica usually exists to hold a different sort, and forwardToReplicas: true would overwrite that on every index run. So if we ever add replicas, the right move is to declare them in the script (which can express per-replica settings), not blanket-forward.

The synonyms call does forward, and that's deliberate — synonyms should always be identical across replicas, so it's a safe no-op today and the correct behavior if a replica ever shows up. Added a comment in efe7c94 spelling out the asymmetry so it doesn't read like an oversight.

Two notes: there are no replicas on dev_docs/prod_docs today, so this is all forward-looking; and heads up the suggestion is anchored to the pre-refactor block (the read/merge/missingFacets code is gone as of d7d3955 / 9d61ccc), so it can't be committed directly.

Overwrite attributesForFaceting and ranking on every index run instead of
reading current settings, diffing, and additively merging facets. The indexer
is the source of truth for these settings, so dashboard edits should revert on
the next run -- but the additive facet merge was preserving the very drift it
was meant to overwrite (ranking already overwrote; faceting now matches).

setSettings is a top-level partial merge, so we declare only the keys we own
and leave customRanking/searchableAttributes untouched.

availableSDKs is intentionally not faceted: the client only retrieves it to
render per-result SDK icons (Search.tsx SDKsIcon), never filters or counts on
it, and retrieval is independent of faceting. branch/record_batch are
filterOnly -- the stale-record cleanup uses facetFilters, never facet counts.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@manovotny
Copy link
Copy Markdown
Contributor

manovotny commented Jun 2, 2026

Follow-up: the indexer now codifies the docs index's relevance settings — declares them and overwrites on every run. d7d3955, 9d61ccc

Acting on review feedback: instead of read current settings → diff → additively merge each run, the indexer now declares the settings it owns and overwrites them every run, so any dashboard edit reverts on the next reindex. The additive facet merge actually worked against the goal — it preserved whatever was set in the dashboard rather than overwriting it, so indexes could (and did) drift apart.

const searchableAttributes = ['unordered(hierarchy.lvl0)', /* …lvl1–6 */ 'content', 'unordered(keywords)']
const attributesForFaceting = ['branch', 'record_batch', 'sdk'].map((a) => `filterOnly(${a})`)
const ranking = ['typo', 'geo', 'words', 'filters', 'attribute', 'exact', 'proximity', 'custom']
const customRanking = ['desc(weight.pageRank)', 'desc(weight.level)', 'asc(weight.position)']
await algolia.setSettings({ indexName: ALGOLIA_INDEX_NAME, indexSettings: { searchableAttributes, attributesForFaceting, ranking, customRanking } })

Why these four and not the whole settings object (i.e. we don't codify defaults — correct): setSettings is a top-level partial merge, so we declare only the relevance levers we deliberately own and leave everything else (typo tolerance, pagination, highlighting, and Algolia's server-managed defaults) untouched. Snapshotting the full object would freeze those defaults at snapshot time and silently clobber any future default change or intentional tweak — diff noise and maintenance for zero benefit. Synonyms stay generated from _tooltips (already a full replace).

The two added in 9d61ccc8f:

  • searchableAttributes — the corpus + attribute priority the attribute ranking criterion rides on (it's what makes a heading match beat a body match, so the reorder depends on it). Identical across dev/prod today, so this locks in the value the ranking work was tested against rather than freezing an unknown state.
  • customRanking — the final tiebreaker, normalized to the weights the indexer actually writes (pageRank/level/position). Drops the dead desc(weight.popularity) entry lingering on dev_docs — it's never written to records, so it ranked nothing.

Why availableSDKs is intentionally not faceted — it powers the per-result SDK icons, but the client only retrieves it (Search.tsxSDKsIcon); it never filters or counts on it, and retrieval is independent of faceting. So the availableSDKs facet that existed on dev_docs was inert — dropping it doesn't touch the icons. branch/record_batch are filterOnly because the stale-record cleanup filters via facetFilters, never facet counts.

Effect on the next index run: adds sdk faceting everywhere, normalizes branch/record_batch to filterOnly, drops the stray availableSDKs facet and the dead customRanking entry on dev_docs, and locks searchableAttributes/ranking to the tested values — leaving prod/dev/test with an identical, code-defined config.

A "warn if the dashboard was edited" guard was considered and deferred: since the indexer rewrites these every run, drift self-heals on the next index, so an explicit alert is more than this needs right now.

Extend the declared settings to the remaining relevance levers so the indexer
fully owns them and indexes can't drift apart:

- searchableAttributes: the corpus + attribute priority the `attribute` ranking
  criterion (promoted above proximity) rides on. Identical across dev/prod
  today, so this just locks in the value the ranking work was tested against.
- customRanking: the final tiebreaker, normalized to the weights the indexer
  actually writes (pageRank/level/position). Drops the dead
  desc(weight.popularity) entry lingering on some indexes -- it's never written
  to records, so it ranks nothing.

Still a scoped declaration, not a full settings snapshot: setSettings is a
partial merge, so Algolia's server-managed defaults stay untouched.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…nyms do

The asymmetry read like an oversight in review. It's deliberate: synonyms must
be identical across replicas, but settings bundle ranking/customRanking, which a
standard replica may override for an alternate sort -- forwarding would clobber
it. Comment-only; no behavior change.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@NWylynko
Copy link
Copy Markdown
Contributor Author

NWylynko commented Jun 2, 2026

@manovotny

With adding the settings update to the script it creates a bit of split understanding of the script. Before the settings updates was added, the script would entirely scope itself to the runners current branch, including the branch name in the records that where being added, allowing the search client to scope itself to that branch and know that its only getting that set of results.

But when updating the index settings its across all records in that index, not just the records that are scoped to the branch that is being updated. IMO this introduces a split in how you'd expect the script to run, either not caring about the git branch and just having a single 'dev_docs' index, or that changes made only effect the search items being updated and would expect settings would only change for that branch too.

This stems from a limitation of algolia ultimately and a desire on my part to not spin up a whole new index on every search development branch. So if this is a trade-off we are happy with then I'm happy to move ahead, but I haven't seen this addresses so just want to get your opinion.

…s split

Nick flagged that the script is branch-scoped for records but index-wide for
settings, which reads as a split. It's inherent (Algolia has no per-branch
settings) and accepted: shared index + branch-scoped records is the cost-
conscious middle ground; isolated settings experiments use a personal index.

- Script: scope note in the settings block explaining the split + the escape hatch.
- AGENTS.md: new "Search index (Algolia)" section capturing the indexer model —
  indexes, branch-scoping, codified settings (4 + synonyms, declare-and-overwrite),
  filterOnly faceting + the availableSDKs/retrieval gotcha, forwardToReplicas
  asymmetry, ranking↔searchableAttributes coupling, and local testing.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@manovotny
Copy link
Copy Markdown
Contributor

Totally fair catch — it's the sharpest point on the thread, and you're right that it was undocumented.

The way I've come to think about it: records and settings are genuinely different kinds of thing, so the two scopes are each correct rather than inconsistent. Records are data — rightly per-branch (the branch: tag + client filter is what lets us share one index). Settings/synonyms are the index's relevance contract — a property of the index, not of a branch's content. Algolia has no per-branch settings, and conceptually there shouldn't be: "how this index ranks" isn't branch-specific.

Codifying also shrinks the downside you're pointing at. Before, a run could push arbitrary hand-tuned settings index-wide; now every run converges the index to the same declared values, and any change to them is a reviewed diff. So a content branch's indexer run just re-asserts canonical settings — nothing leaks. The seam only bites a search-relevance branch (rare — like this one) that's actively changing settings, and the escape hatch already exists and is exactly what I used here: point ALGOLIA_INDEX_NAME at a personal throwaway index. You pay for an extra index only while experimenting with settings, not per branch.

Full per-branch isolation is the "pure" answer — if usage/cost weren't a factor I'd spin up an index per branch and call it done. But that's a lot of indexes (and money) for a rare need, so shared dev_docs + branch-scoped records + a personal index for settings experiments is the middle ground I'd rather steward toward.

So — accepted trade-off, agreed. And to your "haven't seen this addressed": it is now, in f85cd12 — a scope note in the settings block, plus a new "Search index (Algolia)" section in AGENTS.md spelling out the split, the codified settings, and the personal-index escape hatch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants