[Bug]: LanceDB index grows unbounded and fills the disk when cascade compaction/prune silently fails

### Area
src/everos

### What happened?
The on-disk LanceDB index under `~/.everos/.index/lancedb` can grow without bound
until it fills the entire disk, while the Markdown source of truth stays tiny.

On a self-hosted single-user instance I have hit this twice:
- `atomic_fact.lance` bloated to **318 GB** (8,974 data fragments, 1,759 versions)
  while the live data was only a few MB and the Markdown truth was ~26 MB.
- It recurred and reached **583 GB**, filling the volume to **0 bytes free** (shell
  itself started returning ENOSPC).

The root problem is that the cascade maintenance worker **swallows every
compaction/prune failure** and keeps running, so stale LanceDB versions accumulate
forever with no cap, no metric, and no user-visible signal:

- `memory/cascade/worker.py` `_run_optimize_once` wraps `optimize()`/cleanup in a
  broad `except Exception:  # never crash the daemon`, logs a
  `cascade_lancedb_optimize_failed` **warning**, and continues.
- There is no max-version / max-size cap on the index, no index-size health metric,
  and no `prune`/maintenance CLI to recover.

Once compaction is broken, prune never runs → versions pile up → disk fills →
**death spiral**: a full disk means compaction can't even write its temp scratch,
so it can never prune itself back down.

### Two failure modes that break compaction
1. **FD exhaustion (EMFILE / os error 24).** LanceDB maintenance needs ~290 FDs
   (per `docs/cascade_runbook.md`), but a daemon launched under macOS's default soft
   limit of 256 (`launchctl limit maxfiles`) hits EMFILE on every cleanup cycle.
   Logs showed thousands of `os error 24` / "Too many open files". Raising the
   launcher's `NumberOfFiles` soft limit to 8192 fixed this mode.
2. **lance list-encoding corruption (persists even after the FD fix).**
   `optimize()` dies with a lance 7.0.0 error like
   `Max offset of 648640 exceeds length of values 466149` on an `atomic_fact`
   `list<...>` column (`list.rs`). Because `optimize()` runs compaction *before*
   cleanup, it never reaches the cleanup step → reclaims nothing → unbounded growth.

### Steps to reproduce
1. Run the EverOS daemon continuously and keep adding memories so the cascade worker
   compacts/prunes on its normal schedule.
2. Cause compaction to fail — easiest is to launch the daemon under a low FD soft
   limit (macOS default 256), or let the lance list-encoding error above occur on
   `atomic_fact`.
3. Watch `du -sh ~/.everos/.index/lancedb` climb into the tens/hundreds of GB while
   `~/.everos/evermem/**.md` stays a few tens of MB.
4. `grep cascade_lancedb_optimize_failed` in the logs — failures are logged as
   warnings only; the daemon keeps serving and never surfaces the bloat.

### Environment
- OS: macOS (Darwin), single-user self-host, LaunchAgent
- EverOS: 1.0.0 and 1.1.0 (reproduced on both)
- lance / lancedb: 7.0.0
- Markdown truth ~34 MB; index bloated to 318 GB then 583 GB

### Workaround
- When compaction is broken but the disk still has room: stop the daemon and call
  lance `cleanup_old_versions(older_than=timedelta(0), delete_unverified=True)`
  **directly** on each `*.lance` dir — this bypasses the broken compaction step that
  `Table.optimize()` runs first, and reclaims the stale versions (row counts
  unchanged).
- At a **full disk** that direct cleanup is impossible (no scratch space). Recovery:
  stop daemon → `rm -rf ~/.everos/.index/{lancedb,sqlite}` (the `.index` is 100%
  rebuildable; the Markdown at `~/.everos/evermem` is the truth) → restart → re-embed
  from Markdown (slow).

### Suggested fixes
- Add a hard cap on index version count / size, or a watchdog that prunes when the
  index greatly exceeds the Markdown footprint.
- Surface a health signal / metric when `optimize()` fails repeatedly instead of only
  a swallowed warning (e.g. expose index size + last-successful-compaction in
  `/health` or a status command).
- Ship a first-class `everos index prune` / maintenance CLI that calls
  `cleanup_old_versions` directly (works even when `optimize()` compaction is broken).
- Raise the FD soft limit in the bundled launchers/docs so EMFILE can't silently
  break maintenance out of the box.
- Fix or work around the underlying lance list-encoding compaction bug
  (pin/upgrade lance, or rewrite the affected `list<...>` column).


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug]: LanceDB index grows unbounded and fills the disk when cascade compaction/prune silently fails #315

Area

What happened?

Two failure modes that break compaction

Steps to reproduce

Environment

Workaround

Suggested fixes

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

[Bug]: LanceDB index grows unbounded and fills the disk when cascade compaction/prune silently fails #315

Description

Area

What happened?

Two failure modes that break compaction

Steps to reproduce

Environment

Workaround

Suggested fixes

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions