[Bug]: everOS /health hangs after 2-5 successful KB uploads — single asyncio event loop deadlocks between cascade + extract_atomic_facts

# everos server start hangs after N successful KB uploads — single asyncio event loop appears to deadlock during cascade + extract_atomic_facts + LanceDB write

## Summary

When ingesting multiple KB documents in sequence (via `/api/v1/knowledge/documents`), everOS reliably hangs after a small number of successful uploads (2–5 chunks in our runs). `/health` starts returning 502/timeout, although the LLM call returns HTTP 201 for some uploads. The process never recovers and must be killed.

This appears to be a deadlock in the single asyncio event loop between:
1. The HTTP request handler waiting on `extract_knowledge` to finish
2. The `cascade_worker` calling `extract_atomic_facts` on a freshly-written doc
3. Both eventually needing the same LanceDB write slot
4. cascade_worker holding a `processing` row in `md_change_state`, blocking its own retry chain

After a few successful chunks, the OME pipeline stops making forward progress even though llama-server reports `requests_processing=0` and all upstream LLM calls have returned.

## Why this matters

The user-visible failure mode is severe:
- `/health` stops responding → monitoring / MCP tool calls all hang
- `kb-ingest.progress` stops updating (the watchdog has no signal)
- A new client POST `/api/v1/knowledge/documents` hangs forever, never receiving a response
- Process must be hard-killed; on restart, cascade rebuild (2 s) returns but the queue still has stale `processing` rows from the prior run

Reproduction is trivial: upload more than ~5 KB documents back-to-back. The lockup happens on a single-host, single-process deployment — exactly the in-scope use case for v1.x.

## Observed evidence

### everOS log near the hang

```
2026-06-28T12:33:41.329Z [info] document created  doc_id=d_6e1ee6c085c1  topic_count=15
2026-06-28T12:33:41.334Z [info] POST /api/v1/knowledge/documents 201
2026-06-28T12:34:07.205Z [info] document created  doc_id=d_845129d6c5d9  topic_count=1
2026-06-28T12:34:07.212Z [info] POST /api/v1/knowledge/documents 201
2026-06-28T12:37:20.458Z [info] GET /health 200              <-- last successful /health
2026-06-28T12:38:09.[...] [error] LLMError "Request timed out."
```

### llama-server metrics during the hang

```
llamacpp:requests_processing    0    <-- LLM is idle
llamacpp:requests_deferred      0
llamacpp:n_busy_slots_per_decode 1.05
llamacpp:n_decode_total         22367
llamacpp:n_tokens_max           52811
```

The model has finished work. everOS is stuck *after* the LLM returns.

### Python state of the hung everOS process

- `Open TCP connections`: 1 ESTABLISHED from ingest client + ~5 internal
- `Threads`: 89 (vs ~50 at idle)
- `Handles`: 613
- `Memory`: ~450 MB RSS, slowly climbing
- `CPU`: ~3-5 % (only event-loop housekeeping)

A traceback captured at hang time:

```
File "...\starlette\middleware\errors.py", line 165, in __call__
  await self.app(scope, receive, _send)
LLMError: 'Request timed out.'

httpx.ReadTimeout     timeout=NOT_GIVEN
AsyncHTTP11Connection ['http://127.0.0.1:8585', CLOSED]
```

The connection to llama-server has been closed by the server side after some unknown timeout, but everOS's task is still awaiting it.

## Reproduction shape

Single-machine reproduction on Windows + Python 3.12 + EverOS 1.1.0:

1. `everos init`, then edit `<root>/everos.toml` with any OpenAI-compatible LLM endpoint (we tested with `Qwen3.5-9B-UD-Q4_K_XL` via `llama-server` and `minimax-m3` via `https://api.minimaxi.com/v1` — both reproduce)
2. `everos server start`
3. From another shell, upload 10–20 KB documents at ~10 s each via `POST /api/v1/knowledge/documents` with `multipart/form-data`
4. Watch `kb-ingest.progress` (or `GET /api/v1/knowledge/documents`)
5. After 2–5 uploads return 201, the next upload hangs
6. `curl http://127.0.0.1:8000/health` → timeout
7. Killing `everos` and restarting recovers, but the next round repeats

The Python client we used (urllib with `timeout=300`):

```python
req = urllib.request.Request(url, data=body, headers={"Content-Type": f"multipart/form-data; boundary={boundary}"}, method="POST")
with urllib.request.urlopen(req, timeout=300) as r:
    return r.status, r.read()
```

## Expected behavior

Each `POST /api/v1/knowledge/documents` should:
1. Accept the request
2. Run `extract_knowledge` (LLM call) and write the markdown + cascade entry
3. Return `201` within bounded time

`/health` should keep returning `200` throughout, even while `extract_atomic_facts` / cascade are processing in the background. The cascade worker should not block user-facing API responses.

## Suspected root cause

Based on observable behavior, the deadlock likely involves these pieces competing in the single event loop:

1. `Runner.run()` in `infra/ome/_dispatch/runner.py:128` holds the OME engine semaphore for the entire retry chain (max_retries × timeout). When the inner LLM call exceeds the httpx client timeout, the retry keeps re-entering and the semaphore never releases.

2. cascade_worker `_run_loop()` calls `extract_atomic_facts` for each newly-written doc. While processing one row, it holds an internal claim (`status='processing'` in `md_change_state`) and is awaiting an LLM call. New requests that try to ingest a doc want the same LanceDB connection pool.

3. `LanceDB` writes (knowledge_topic.lance) serialize through a single writer. cascade_worker trying to upsert + the request handler trying to upsert + `extract_atomic_facts` all touching the same table → cross-task dependency on the same async resource.

4. The semaphore is not a per-attempt timeout — if a downstream task (cascade) holds the slot and is itself awaiting the LLM, all upstream tasks wait indefinitely.

The combination of:
- single asyncio event loop
- one shared engine semaphore
- one cascade worker serializing through that semaphore
- one LanceDB writer per table

creates a deadlock window whenever LLM call duration exceeds request handler timeout *and* cascade worker happens to be processing the same table.

## Workarounds we tried (all partial)

| Workaround | Effect |
|---|---|
| Increase `LLMConfig.timeout` 60→300 s | Less timeouts, but doesn't prevent hang — only LLM call returns faster, hang still happens downstream |
| Disable cascade scanner via `EVEROS_CASCADE_SCANNER_DISABLED=1` | Reduces noise but cascade_worker still runs |
| Reduce `-c` to give more llama-server headroom | Doesn't help — bottleneck is not LLM |
| Add more llama-server slots | Doesn't help — only 1–2 active requests in flight at hang time |
| Drop `--reasoning` on llama-server | Helps indirectly but does not eliminate the hang |

The most reliable workaround we found is to **bulk-import** the markdown files directly into `<root>/default_app/default_project/knowledge/Technology/...` and then **let cascade drain at its own pace** with no concurrent ingest. But this defeats the use of the HTTP API and doesn't work for users who want to upload programmatically.

## Environment

```
EverOS: 1.1.0 (PyPI) — also reproduced on 1.0.1
Python: 3.12
Runtime: bare-metal Windows 11
LLM: Qwen3.5-9B-UD-Q4_K_XL (local llama-server b9469, --reasoning off, --cache-type-k q8_0)
     and minimax/minimax-m3 (https://api.minimaxi.com/v1)
Embedding: 9B via llama-server --embedding --pooling last, everOS truncates to 1024-d
KB size at hang: ~840 documents, ~5 GB LanceDB knowledge_topic
Concurrent uploads: 1 (sequential ingest script, ~10 s per file, ~80 KB chunks)
```

## Possible fixes (suggestion only — maintainers decide)

This is not a small change, but a few directions that look promising:

1. Make `Runner.run` (and the OME semaphore) honor a per-attempt deadline so a stuck task can't hold the semaphore indefinitely.
2. Decouple cascade_worker from the OME engine semaphore — let it run in its own bounded queue with a smaller concurrency budget, so ingest requests never wait on cascade.
3. Add a watchdog task that, every N seconds, force-releases stuck `processing` rows and resets their `retry_count`.
4. Make `/health` truly independent — currently it goes through the same middleware chain that can be blocked.

Happy to test any patches or provide more traces if helpful.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug]: everOS /health hangs after 2-5 successful KB uploads — single asyncio event loop deadlocks between cascade + extract_atomic_facts #316

everos server start hangs after N successful KB uploads — single asyncio event loop appears to deadlock during cascade + extract_atomic_facts + LanceDB write

Summary

Why this matters

Observed evidence

everOS log near the hang

llama-server metrics during the hang

Python state of the hung everOS process

Reproduction shape

Expected behavior

Suspected root cause

Workarounds we tried (all partial)

Environment

Possible fixes (suggestion only — maintainers decide)

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Workaround	Effect
Increase `LLMConfig.timeout` 60→300 s	Less timeouts, but doesn't prevent hang — only LLM call returns faster, hang still happens downstream
Disable cascade scanner via `EVEROS_CASCADE_SCANNER_DISABLED=1`	Reduces noise but cascade_worker still runs
Reduce `-c` to give more llama-server headroom	Doesn't help — bottleneck is not LLM
Add more llama-server slots	Doesn't help — only 1–2 active requests in flight at hang time
Drop `--reasoning` on llama-server	Helps indirectly but does not eliminate the hang

Uh oh!

[Bug]: everOS /health hangs after 2-5 successful KB uploads — single asyncio event loop deadlocks between cascade + extract_atomic_facts #316

Description

everos server start hangs after N successful KB uploads — single asyncio event loop appears to deadlock during cascade + extract_atomic_facts + LanceDB write

Summary

Why this matters

Observed evidence

everOS log near the hang

llama-server metrics during the hang

Python state of the hung everOS process

Reproduction shape

Expected behavior

Suspected root cause

Workarounds we tried (all partial)

Environment

Possible fixes (suggestion only — maintainers decide)

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions