Skip to content

feat: framework logic-change detection for live components#2192

Open
pyjuan91 wants to merge 4 commits into
cocoindex-io:mainfrom
pyjuan91:feat/live-logic-deps-aggregation
Open

feat: framework logic-change detection for live components#2192
pyjuan91 wants to merge 4 commits into
cocoindex-io:mainfrom
pyjuan91:feat/live-logic-deps-aggregation

Conversation

@pyjuan91

Copy link
Copy Markdown
Contributor

Summary

Closes #2124. Framework-computed logic-change detection for live
components, replacing #2116's hand-maintained logic_version with a signal
the engine computes itself.

This completes D1's live-component scope. It builds on the already-merged
#2142 (the prerequisite fix that makes stored memos subtree-complete across
the mount boundary); together they deliver the issue's proposed direction
end to end. The one carve-out is processing-affecting config/arg values
(e.g. chunk_size=5121024), which the issue itself split to #2126 as
a documented known gap — processing_unchanged() is the signal #2126 ANDs
an args-key check into.

The skip gate the issue describes — bootstrap_done ∧ durable ∧ logic_unchanged — lands as durable_stream and await processing_unchanged(): processing_unchanged() folds in both a scan has
bootstrapped
(false when no committed dep-set S exists) and logic
unchanged
(all_contained_with_env(S)); durability stays connector-owned
via durable_stream.

What's included

Core (rust/core)

  • run_in_background gains an optional oneshot outcome_sink, so a live
    root build (which has no parent readiness guard) can surface its
    rolled-up subtree logic_deps instead of dropping them.
  • update_full REPLACEs the persisted aggregate set S; incremental
    update EXTENDs it. S lives in the Live keyspace under a
    framework-reserved key (sys/live_logic_deps).
  • LiveComponentController::processing_unchanged() reads S and checks
    every fingerprint is still registered. Failure-safe false (no S,
    decode error, or any dep's code changed ⇒ re-scan).

Python SDK (rust/py + python/)

  • processing_unchanged() exposed via PyO3 → LiveComponentOperator /
    LiveMapSubscriber.
  • OCI connector: the manual logic_version: str knob is replaced by
    durable_stream: bool; skip-scan is now durable_stream and await subscriber.processing_unchanged(). _SCAN_VERSION_KEY and the version
    read/write dance are removed.

Rust SDK (rust/sdk/cocoindex) — kept at parity with the Python SDK

  • processing_unchanged() on the operator + subscriber.
  • OCI walker: logic_versiondurable_stream, OCI_SCAN_VERSION_KEY
    removed.

Design notes (the three things flagged on the issue)

  • Flows up without entangling live readiness. The rolled-up subtree
    deps ride a dedicated oneshot outcome_sink out of run_in_background,
    separate from the HandleOutcome / mark_ready readiness+error
    machinery (unchanged). A live root has no parent readiness guard, so
    without this its now-populated ComponentRunOutcome.logic_deps were
    dropped at task end.
  • update_full vs. incremental. update_full recomputes and
    replaces S (edited-away fingerprints drop out); incremental
    update extends S (a streamed item is an existing item on the
    next restart, so its code must be in S); delete is a no-op.
  • Cache hits. Since fix(core): propagate logic deps across the mount boundary #2142 made stored memos subtree-complete, a memo
    hit contributes its full subtree closure for free — an unchanged
    process_file hit still surfaces process_chunk's fingerprint, so
    editing a mounted child is correctly not skipped.

Scope / non-goals

The framework owns only the logic-change signal. Durability stays
connector-owned — the user asserts it via durable_stream, since a
LiveStream exposes no cursor for the framework to detect; the connector
keeps owning the skip wiring. Not generic live-component memoization or
per-item skip.

Known gap (tracked separately)

Detection covers code changes, not config/arg values
(chunk_size=5121024 moves no fingerprint). That's the #2126
follow-up — processing_unchanged() is the signal #2126 ANDs an args-key
equality into.

Docs

durable_stream is user-facing and, as the D1 end state, legitimately
documentable — but I've left the OCI connector docs out of this PR since
docs are usually handled separately. Happy to add a section here or as a
follow-up, whichever fits your process.

Test plan

  • cargo test (core + SDK), uv run mypy, uv run pytest python/: green
  • Full prek run --all-files: all hooks pass
  • New coverage: live memo+mount subtree processing_unchanged() across
    runs (True when unchanged, False after a simulated logic change, False on
    first run); OCI skip-scan (not-durable always scans / durable scans on
    logic change / durable skips when unchanged)

pyjuan91 added 4 commits June 22, 2026 18:25
Persist a live component's subtree logic-dependency set `S` (own fp ∪
all descendants) as it processes, so a later run can detect whether the
processing logic changed without a second pass over persisted memo
entries. Foundation for framework-level logic-change detection on live
components (cocoindex-io#2124).

- Surface a root build's rolled-up deps without touching the readiness
  path: `run_in_background` takes an optional `oneshot` outcome sink,
  used when a root has no parent readiness guard to roll up to. The
  foreground mount path keeps rolling up via the guard and passes None.
- `update_full` recomputes and replaces `S` (edited-away fingerprints
  drop out); an incremental `update` op extends it with the item's
  subtree deps, skipping the write once they are already covered.
- Store `S` under a framework-reserved Symbol key in the Live keyspace,
  encoded as a sorted fingerprint vec. Dropped sink ⇒ no persist
  (failure-safe).

Write path only; reading `S` to gate a scan skip is a follow-up.
…tion

Add the read side of framework-level logic-change detection for live
components (cocoindex-io#2124): a predicate that reports whether a component's
processing logic is unchanged since its last committed scan, so a durable
connector can gate its startup full scan on it.

`LiveComponentController::processing_unchanged()` reads the persisted
subtree dependency set `S` and checks every fingerprint is still
registered in the current logic set. Failure-safe — returns false when no
scan was ever committed, when the stored value can't be decoded, or when
any dependency's code changed (each means "re-scan").

Surfaced through PyO3 as `processing_unchanged_async`, then on
`LiveComponentOperator`/`LiveComponentSubscriber` as `processing_unchanged()`
so a connector pairs it with its own durable cursor:
`<durable cursor> and await subscriber.processing_unchanged()`.

Tests cover first-run (no S -> false), unchanged across runs (-> true),
and a simulated child-code change (-> false).
Replace the OCI connector's manual `logic_version` skip-scan opt-in
(cocoindex-io#2116) with the framework-computed signal from cocoindex-io#2124. The live view now
gates its startup-scan skip on `durable_stream and await
subscriber.processing_unchanged()` — no hand-maintained version string,
no stale-state-on-forgotten-bump footgun.

- `list_objects(..., logic_version=...)` → `list_objects(..., durable_stream=...)`:
  a bool the user sets to assert the stream durably replays its backlog.
  The logic-change check is now automatic; durability stays the user's
  responsibility (a LiveStream exposes no cursor to detect it).
- Drop `_SCAN_VERSION_KEY` and the committed-version read/write dance;
  the framework persists the subtree dependency set itself.
- Tests: mock subscriber drops the committed-state version simulation for
  a controllable `processing_unchanged()`; the four version-matching
  cases collapse to three (not-durable scans, durable+changed scans,
  durable+unchanged skips).
Bring the Rust SDK to parity with the Python SDK's live logic-change
detection (cocoindex-io#2124):

- Expose processing_unchanged() on LiveComponentOperator and the
  LiveMapSubscriber delegate, calling the core controller predicate.
- Rewire the OCI live walker from the manual logic_version string to
  durable_stream: bool — skip the startup scan on reruns only when the
  user asserts a durable stream and the framework reports the processing
  logic unchanged. Removes OCI_SCAN_VERSION_KEY and the per-scan version
  write.
@badmonster0 badmonster0 requested a review from georgeh0 June 22, 2026 15:46
Comment on lines +1221 to +1231
let encoded = encode_logic_deps(deps)?;
component
.app_ctx()
.app_store()
.write_user_state_standalone(
component.stable_path(),
db_schema::StateKind::Live,
&logic_deps_state_key(),
&encoded,
)
.await

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think user state may not be the right place for logic deps.

What we need to persist for change detection is pretty much the same as ComponentMemoizationInfo. And the intention is also similar to its existing usage: they keep necessary information to validate if a regular component's last execution is still valid, and our purpose is actually the same.

So I think probably we can just reuse ComponentMemoizationInfo and reuse the same entry in the DB for it.

User states for live components is intended to store states related to specific live-component logic. And live-component logic will decide if persisting the ComponentMemoizationInfo and if the last persisted ComponentMemoizationInfo is currently valid. But these don't belong to user states.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE] Framework-level logic-change detection for live components

2 participants