feat: framework logic-change detection for live components#2192
feat: framework logic-change detection for live components#2192pyjuan91 wants to merge 4 commits into
Conversation
Persist a live component's subtree logic-dependency set `S` (own fp ∪ all descendants) as it processes, so a later run can detect whether the processing logic changed without a second pass over persisted memo entries. Foundation for framework-level logic-change detection on live components (cocoindex-io#2124). - Surface a root build's rolled-up deps without touching the readiness path: `run_in_background` takes an optional `oneshot` outcome sink, used when a root has no parent readiness guard to roll up to. The foreground mount path keeps rolling up via the guard and passes None. - `update_full` recomputes and replaces `S` (edited-away fingerprints drop out); an incremental `update` op extends it with the item's subtree deps, skipping the write once they are already covered. - Store `S` under a framework-reserved Symbol key in the Live keyspace, encoded as a sorted fingerprint vec. Dropped sink ⇒ no persist (failure-safe). Write path only; reading `S` to gate a scan skip is a follow-up.
…tion Add the read side of framework-level logic-change detection for live components (cocoindex-io#2124): a predicate that reports whether a component's processing logic is unchanged since its last committed scan, so a durable connector can gate its startup full scan on it. `LiveComponentController::processing_unchanged()` reads the persisted subtree dependency set `S` and checks every fingerprint is still registered in the current logic set. Failure-safe — returns false when no scan was ever committed, when the stored value can't be decoded, or when any dependency's code changed (each means "re-scan"). Surfaced through PyO3 as `processing_unchanged_async`, then on `LiveComponentOperator`/`LiveComponentSubscriber` as `processing_unchanged()` so a connector pairs it with its own durable cursor: `<durable cursor> and await subscriber.processing_unchanged()`. Tests cover first-run (no S -> false), unchanged across runs (-> true), and a simulated child-code change (-> false).
Replace the OCI connector's manual `logic_version` skip-scan opt-in (cocoindex-io#2116) with the framework-computed signal from cocoindex-io#2124. The live view now gates its startup-scan skip on `durable_stream and await subscriber.processing_unchanged()` — no hand-maintained version string, no stale-state-on-forgotten-bump footgun. - `list_objects(..., logic_version=...)` → `list_objects(..., durable_stream=...)`: a bool the user sets to assert the stream durably replays its backlog. The logic-change check is now automatic; durability stays the user's responsibility (a LiveStream exposes no cursor to detect it). - Drop `_SCAN_VERSION_KEY` and the committed-version read/write dance; the framework persists the subtree dependency set itself. - Tests: mock subscriber drops the committed-state version simulation for a controllable `processing_unchanged()`; the four version-matching cases collapse to three (not-durable scans, durable+changed scans, durable+unchanged skips).
Bring the Rust SDK to parity with the Python SDK's live logic-change detection (cocoindex-io#2124): - Expose processing_unchanged() on LiveComponentOperator and the LiveMapSubscriber delegate, calling the core controller predicate. - Rewire the OCI live walker from the manual logic_version string to durable_stream: bool — skip the startup scan on reruns only when the user asserts a durable stream and the framework reports the processing logic unchanged. Removes OCI_SCAN_VERSION_KEY and the per-scan version write.
| let encoded = encode_logic_deps(deps)?; | ||
| component | ||
| .app_ctx() | ||
| .app_store() | ||
| .write_user_state_standalone( | ||
| component.stable_path(), | ||
| db_schema::StateKind::Live, | ||
| &logic_deps_state_key(), | ||
| &encoded, | ||
| ) | ||
| .await |
There was a problem hiding this comment.
I think user state may not be the right place for logic deps.
What we need to persist for change detection is pretty much the same as ComponentMemoizationInfo. And the intention is also similar to its existing usage: they keep necessary information to validate if a regular component's last execution is still valid, and our purpose is actually the same.
So I think probably we can just reuse ComponentMemoizationInfo and reuse the same entry in the DB for it.
User states for live components is intended to store states related to specific live-component logic. And live-component logic will decide if persisting the ComponentMemoizationInfo and if the last persisted ComponentMemoizationInfo is currently valid. But these don't belong to user states.
Summary
Closes #2124. Framework-computed logic-change detection for live
components, replacing #2116's hand-maintained
logic_versionwith a signalthe engine computes itself.
This completes D1's live-component scope. It builds on the already-merged
#2142 (the prerequisite fix that makes stored memos subtree-complete across
the mount boundary); together they deliver the issue's proposed direction
end to end. The one carve-out is processing-affecting config/arg values
(e.g.
chunk_size=512→1024), which the issue itself split to #2126 asa documented known gap —
processing_unchanged()is the signal #2126 ANDsan args-key check into.
The skip gate the issue describes —
bootstrap_done ∧ durable ∧ logic_unchanged— lands asdurable_stream and await processing_unchanged():processing_unchanged()folds in both a scan hasbootstrapped (false when no committed dep-set
Sexists) and logicunchanged (
all_contained_with_env(S)); durability stays connector-ownedvia
durable_stream.What's included
Core (
rust/core)run_in_backgroundgains an optional oneshotoutcome_sink, so a liveroot build (which has no parent readiness guard) can surface its
rolled-up subtree
logic_depsinstead of dropping them.update_fullREPLACEs the persisted aggregate setS; incrementalupdateEXTENDs it.Slives in theLivekeyspace under aframework-reserved key (
sys/live_logic_deps).LiveComponentController::processing_unchanged()readsSand checksevery fingerprint is still registered. Failure-safe
false(noS,decode error, or any dep's code changed ⇒ re-scan).
Python SDK (
rust/py+python/)processing_unchanged()exposed via PyO3 →LiveComponentOperator/LiveMapSubscriber.logic_version: strknob is replaced bydurable_stream: bool; skip-scan is nowdurable_stream and await subscriber.processing_unchanged()._SCAN_VERSION_KEYand the versionread/write dance are removed.
Rust SDK (
rust/sdk/cocoindex) — kept at parity with the Python SDKprocessing_unchanged()on the operator + subscriber.logic_version→durable_stream,OCI_SCAN_VERSION_KEYremoved.
Design notes (the three things flagged on the issue)
deps ride a dedicated oneshot
outcome_sinkout ofrun_in_background,separate from the
HandleOutcome/mark_readyreadiness+errormachinery (unchanged). A live root has no parent readiness guard, so
without this its now-populated
ComponentRunOutcome.logic_depsweredropped at task end.
update_fullrecomputes andreplaces
S(edited-away fingerprints drop out); incrementalupdateextendsS(a streamed item is an existing item on thenext restart, so its code must be in
S); delete is a no-op.hit contributes its full subtree closure for free — an unchanged
process_filehit still surfacesprocess_chunk's fingerprint, soediting a mounted child is correctly not skipped.
Scope / non-goals
The framework owns only the logic-change signal. Durability stays
connector-owned — the user asserts it via
durable_stream, since aLiveStreamexposes no cursor for the framework to detect; the connectorkeeps owning the skip wiring. Not generic live-component memoization or
per-item skip.
Known gap (tracked separately)
Detection covers code changes, not config/arg values
(
chunk_size=512→1024moves no fingerprint). That's the #2126follow-up —
processing_unchanged()is the signal #2126 ANDs an args-keyequality into.
Docs
durable_streamis user-facing and, as the D1 end state, legitimatelydocumentable — but I've left the OCI connector docs out of this PR since
docs are usually handled separately. Happy to add a section here or as a
follow-up, whichever fits your process.
Test plan
cargo test(core + SDK),uv run mypy,uv run pytest python/: greenprek run --all-files: all hooks passprocessing_unchanged()acrossruns (True when unchanged, False after a simulated logic change, False on
first run); OCI skip-scan (not-durable always scans / durable scans on
logic change / durable skips when unchanged)