everos server start hangs after N successful KB uploads — single asyncio event loop appears to deadlock during cascade + extract_atomic_facts + LanceDB write
Summary
When ingesting multiple KB documents in sequence (via /api/v1/knowledge/documents), everOS reliably hangs after a small number of successful uploads (2–5 chunks in our runs). /health starts returning 502/timeout, although the LLM call returns HTTP 201 for some uploads. The process never recovers and must be killed.
This appears to be a deadlock in the single asyncio event loop between:
- The HTTP request handler waiting on
extract_knowledge to finish
- The
cascade_worker calling extract_atomic_facts on a freshly-written doc
- Both eventually needing the same LanceDB write slot
- cascade_worker holding a
processing row in md_change_state, blocking its own retry chain
After a few successful chunks, the OME pipeline stops making forward progress even though llama-server reports requests_processing=0 and all upstream LLM calls have returned.
Why this matters
The user-visible failure mode is severe:
/health stops responding → monitoring / MCP tool calls all hang
kb-ingest.progress stops updating (the watchdog has no signal)
- A new client POST
/api/v1/knowledge/documents hangs forever, never receiving a response
- Process must be hard-killed; on restart, cascade rebuild (2 s) returns but the queue still has stale
processing rows from the prior run
Reproduction is trivial: upload more than ~5 KB documents back-to-back. The lockup happens on a single-host, single-process deployment — exactly the in-scope use case for v1.x.
Observed evidence
everOS log near the hang
2026-06-28T12:33:41.329Z [info] document created doc_id=d_6e1ee6c085c1 topic_count=15
2026-06-28T12:33:41.334Z [info] POST /api/v1/knowledge/documents 201
2026-06-28T12:34:07.205Z [info] document created doc_id=d_845129d6c5d9 topic_count=1
2026-06-28T12:34:07.212Z [info] POST /api/v1/knowledge/documents 201
2026-06-28T12:37:20.458Z [info] GET /health 200 <-- last successful /health
2026-06-28T12:38:09.[...] [error] LLMError "Request timed out."
llama-server metrics during the hang
llamacpp:requests_processing 0 <-- LLM is idle
llamacpp:requests_deferred 0
llamacpp:n_busy_slots_per_decode 1.05
llamacpp:n_decode_total 22367
llamacpp:n_tokens_max 52811
The model has finished work. everOS is stuck after the LLM returns.
Python state of the hung everOS process
Open TCP connections: 1 ESTABLISHED from ingest client + ~5 internal
Threads: 89 (vs ~50 at idle)
Handles: 613
Memory: ~450 MB RSS, slowly climbing
CPU: ~3-5 % (only event-loop housekeeping)
A traceback captured at hang time:
File "...\starlette\middleware\errors.py", line 165, in __call__
await self.app(scope, receive, _send)
LLMError: 'Request timed out.'
httpx.ReadTimeout timeout=NOT_GIVEN
AsyncHTTP11Connection ['http://127.0.0.1:8585', CLOSED]
The connection to llama-server has been closed by the server side after some unknown timeout, but everOS's task is still awaiting it.
Reproduction shape
Single-machine reproduction on Windows + Python 3.12 + EverOS 1.1.0:
everos init, then edit <root>/everos.toml with any OpenAI-compatible LLM endpoint (we tested with Qwen3.5-9B-UD-Q4_K_XL via llama-server and minimax-m3 via https://api.minimaxi.com/v1 — both reproduce)
everos server start
- From another shell, upload 10–20 KB documents at ~10 s each via
POST /api/v1/knowledge/documents with multipart/form-data
- Watch
kb-ingest.progress (or GET /api/v1/knowledge/documents)
- After 2–5 uploads return 201, the next upload hangs
curl http://127.0.0.1:8000/health → timeout
- Killing
everos and restarting recovers, but the next round repeats
The Python client we used (urllib with timeout=300):
req = urllib.request.Request(url, data=body, headers={"Content-Type": f"multipart/form-data; boundary={boundary}"}, method="POST")
with urllib.request.urlopen(req, timeout=300) as r:
return r.status, r.read()
Expected behavior
Each POST /api/v1/knowledge/documents should:
- Accept the request
- Run
extract_knowledge (LLM call) and write the markdown + cascade entry
- Return
201 within bounded time
/health should keep returning 200 throughout, even while extract_atomic_facts / cascade are processing in the background. The cascade worker should not block user-facing API responses.
Suspected root cause
Based on observable behavior, the deadlock likely involves these pieces competing in the single event loop:
-
Runner.run() in infra/ome/_dispatch/runner.py:128 holds the OME engine semaphore for the entire retry chain (max_retries × timeout). When the inner LLM call exceeds the httpx client timeout, the retry keeps re-entering and the semaphore never releases.
-
cascade_worker _run_loop() calls extract_atomic_facts for each newly-written doc. While processing one row, it holds an internal claim (status='processing' in md_change_state) and is awaiting an LLM call. New requests that try to ingest a doc want the same LanceDB connection pool.
-
LanceDB writes (knowledge_topic.lance) serialize through a single writer. cascade_worker trying to upsert + the request handler trying to upsert + extract_atomic_facts all touching the same table → cross-task dependency on the same async resource.
-
The semaphore is not a per-attempt timeout — if a downstream task (cascade) holds the slot and is itself awaiting the LLM, all upstream tasks wait indefinitely.
The combination of:
- single asyncio event loop
- one shared engine semaphore
- one cascade worker serializing through that semaphore
- one LanceDB writer per table
creates a deadlock window whenever LLM call duration exceeds request handler timeout and cascade worker happens to be processing the same table.
Workarounds we tried (all partial)
| Workaround |
Effect |
Increase LLMConfig.timeout 60→300 s |
Less timeouts, but doesn't prevent hang — only LLM call returns faster, hang still happens downstream |
Disable cascade scanner via EVEROS_CASCADE_SCANNER_DISABLED=1 |
Reduces noise but cascade_worker still runs |
Reduce -c to give more llama-server headroom |
Doesn't help — bottleneck is not LLM |
| Add more llama-server slots |
Doesn't help — only 1–2 active requests in flight at hang time |
Drop --reasoning on llama-server |
Helps indirectly but does not eliminate the hang |
The most reliable workaround we found is to bulk-import the markdown files directly into <root>/default_app/default_project/knowledge/Technology/... and then let cascade drain at its own pace with no concurrent ingest. But this defeats the use of the HTTP API and doesn't work for users who want to upload programmatically.
Environment
EverOS: 1.1.0 (PyPI) — also reproduced on 1.0.1
Python: 3.12
Runtime: bare-metal Windows 11
LLM: Qwen3.5-9B-UD-Q4_K_XL (local llama-server b9469, --reasoning off, --cache-type-k q8_0)
and minimax/minimax-m3 (https://api.minimaxi.com/v1)
Embedding: 9B via llama-server --embedding --pooling last, everOS truncates to 1024-d
KB size at hang: ~840 documents, ~5 GB LanceDB knowledge_topic
Concurrent uploads: 1 (sequential ingest script, ~10 s per file, ~80 KB chunks)
Possible fixes (suggestion only — maintainers decide)
This is not a small change, but a few directions that look promising:
- Make
Runner.run (and the OME semaphore) honor a per-attempt deadline so a stuck task can't hold the semaphore indefinitely.
- Decouple cascade_worker from the OME engine semaphore — let it run in its own bounded queue with a smaller concurrency budget, so ingest requests never wait on cascade.
- Add a watchdog task that, every N seconds, force-releases stuck
processing rows and resets their retry_count.
- Make
/health truly independent — currently it goes through the same middleware chain that can be blocked.
Happy to test any patches or provide more traces if helpful.
everos server start hangs after N successful KB uploads — single asyncio event loop appears to deadlock during cascade + extract_atomic_facts + LanceDB write
Summary
When ingesting multiple KB documents in sequence (via
/api/v1/knowledge/documents), everOS reliably hangs after a small number of successful uploads (2–5 chunks in our runs)./healthstarts returning 502/timeout, although the LLM call returns HTTP 201 for some uploads. The process never recovers and must be killed.This appears to be a deadlock in the single asyncio event loop between:
extract_knowledgeto finishcascade_workercallingextract_atomic_factson a freshly-written docprocessingrow inmd_change_state, blocking its own retry chainAfter a few successful chunks, the OME pipeline stops making forward progress even though llama-server reports
requests_processing=0and all upstream LLM calls have returned.Why this matters
The user-visible failure mode is severe:
/healthstops responding → monitoring / MCP tool calls all hangkb-ingest.progressstops updating (the watchdog has no signal)/api/v1/knowledge/documentshangs forever, never receiving a responseprocessingrows from the prior runReproduction is trivial: upload more than ~5 KB documents back-to-back. The lockup happens on a single-host, single-process deployment — exactly the in-scope use case for v1.x.
Observed evidence
everOS log near the hang
llama-server metrics during the hang
The model has finished work. everOS is stuck after the LLM returns.
Python state of the hung everOS process
Open TCP connections: 1 ESTABLISHED from ingest client + ~5 internalThreads: 89 (vs ~50 at idle)Handles: 613Memory: ~450 MB RSS, slowly climbingCPU: ~3-5 % (only event-loop housekeeping)A traceback captured at hang time:
The connection to llama-server has been closed by the server side after some unknown timeout, but everOS's task is still awaiting it.
Reproduction shape
Single-machine reproduction on Windows + Python 3.12 + EverOS 1.1.0:
everos init, then edit<root>/everos.tomlwith any OpenAI-compatible LLM endpoint (we tested withQwen3.5-9B-UD-Q4_K_XLviallama-serverandminimax-m3viahttps://api.minimaxi.com/v1— both reproduce)everos server startPOST /api/v1/knowledge/documentswithmultipart/form-datakb-ingest.progress(orGET /api/v1/knowledge/documents)curl http://127.0.0.1:8000/health→ timeouteverosand restarting recovers, but the next round repeatsThe Python client we used (urllib with
timeout=300):Expected behavior
Each
POST /api/v1/knowledge/documentsshould:extract_knowledge(LLM call) and write the markdown + cascade entry201within bounded time/healthshould keep returning200throughout, even whileextract_atomic_facts/ cascade are processing in the background. The cascade worker should not block user-facing API responses.Suspected root cause
Based on observable behavior, the deadlock likely involves these pieces competing in the single event loop:
Runner.run()ininfra/ome/_dispatch/runner.py:128holds the OME engine semaphore for the entire retry chain (max_retries × timeout). When the inner LLM call exceeds the httpx client timeout, the retry keeps re-entering and the semaphore never releases.cascade_worker
_run_loop()callsextract_atomic_factsfor each newly-written doc. While processing one row, it holds an internal claim (status='processing'inmd_change_state) and is awaiting an LLM call. New requests that try to ingest a doc want the same LanceDB connection pool.LanceDBwrites (knowledge_topic.lance) serialize through a single writer. cascade_worker trying to upsert + the request handler trying to upsert +extract_atomic_factsall touching the same table → cross-task dependency on the same async resource.The semaphore is not a per-attempt timeout — if a downstream task (cascade) holds the slot and is itself awaiting the LLM, all upstream tasks wait indefinitely.
The combination of:
creates a deadlock window whenever LLM call duration exceeds request handler timeout and cascade worker happens to be processing the same table.
Workarounds we tried (all partial)
LLMConfig.timeout60→300 sEVEROS_CASCADE_SCANNER_DISABLED=1-cto give more llama-server headroom--reasoningon llama-serverThe most reliable workaround we found is to bulk-import the markdown files directly into
<root>/default_app/default_project/knowledge/Technology/...and then let cascade drain at its own pace with no concurrent ingest. But this defeats the use of the HTTP API and doesn't work for users who want to upload programmatically.Environment
Possible fixes (suggestion only — maintainers decide)
This is not a small change, but a few directions that look promising:
Runner.run(and the OME semaphore) honor a per-attempt deadline so a stuck task can't hold the semaphore indefinitely.processingrows and resets theirretry_count./healthtruly independent — currently it goes through the same middleware chain that can be blocked.Happy to test any patches or provide more traces if helpful.