What happened
While debugging an OV (OpenViking memory server) configuration, an API key was pasted into a Claude Code conversation that OV's auto-capture hook records. OV captured the full message stream into a session (raw messages.jsonl, expected). The memory extraction pipeline then copied the key verbatim into a durable structured memory under data/viking/.../memories/events/<date>/<name>.md and indexed it into the vector store.
Result: the secret became retrievable across future sessions via find / search — a vector recall for an unrelated token surfaced the memory and returned the plaintext key into the LLM context.
Expected
The extraction pipeline's quality gates (the read-before-write + 6-gate filter that decides what becomes a durable memory) should prevent secrets from surviving into curated memories and the vector index.
Raw session capture containing a secret is expected — that's raw conversation, and redacting there would alter the raw-capture contract. But a curated, durable, vector-indexed memory containing a secret contradicts the gates' purpose: the whole point of the filter is to discard what should not be persisted.
Repro
- Run a session where one message contains a test-shaped secret, e.g.
sk-test-FAKE0123456789abcdef.
- Trigger extraction (session commit /
extract / SessionEnd).
- Inspect
data/viking/.../memories/events/<date>/ — an extracted memory contains the secret verbatim.
search / find for a token that appears near the secret — the vector store returns that memory, surfacing the secret into context.
Proposal
Add a secret-scrub gate between LLM extraction and persistence:
- Regex patterns for common secret shapes:
sk-[A-Za-z0-9]{16,} (OpenAI-style)
AQ[A-Za-z0-9_-]{20,} (Gemini-style)
xox[baprs]-[A-Za-z0-9-]+ (Slack)
Bearer\s+[A-Za-z0-9._-]+
- high-entropy hex / base64 of length ≥ 32
- On match in an extracted memory body: replace with
REDACTED_SECRET (or skip vector-indexing that memory, or flag it for review).
- Configurable via env /
ov.conf: pattern list + on/off, so users can extend for internal key shapes; default conservative to avoid false-positive scrubbing of legitimate tokens / UUIDs / commit SHAs.
Scope
Extraction layer only. Session-layer raw capture (messages.jsonl) intentionally unchanged — redacting there would break the raw-capture contract and make raw logs useless for debugging.
Caveat
Secret detection has false-positive risk (commit SHAs, content hashes, long UUIDs can look high-entropy). Suggest opt-in initially + an allowlist for known-safe patterns, so legitimate identifiers are not mangled.
Why I'm filing
I have two open PRs in the auto-capture / compaction direction (#2874, #2853). This issue is the privacy complement to that storage work — reducing what gets captured/stored is half the story; ensuring secrets don't survive curation into vector-retrievable memory is the other half. Happy to contribute a PR if there's appetite for the approach above.
What happened
While debugging an OV (OpenViking memory server) configuration, an API key was pasted into a Claude Code conversation that OV's auto-capture hook records. OV captured the full message stream into a session (raw
messages.jsonl, expected). The memory extraction pipeline then copied the key verbatim into a durable structured memory underdata/viking/.../memories/events/<date>/<name>.mdand indexed it into the vector store.Result: the secret became retrievable across future sessions via
find/search— a vector recall for an unrelated token surfaced the memory and returned the plaintext key into the LLM context.Expected
The extraction pipeline's quality gates (the read-before-write + 6-gate filter that decides what becomes a durable memory) should prevent secrets from surviving into curated memories and the vector index.
Raw session capture containing a secret is expected — that's raw conversation, and redacting there would alter the raw-capture contract. But a curated, durable, vector-indexed memory containing a secret contradicts the gates' purpose: the whole point of the filter is to discard what should not be persisted.
Repro
sk-test-FAKE0123456789abcdef.extract/SessionEnd).data/viking/.../memories/events/<date>/— an extracted memory contains the secret verbatim.search/findfor a token that appears near the secret — the vector store returns that memory, surfacing the secret into context.Proposal
Add a secret-scrub gate between LLM extraction and persistence:
sk-[A-Za-z0-9]{16,}(OpenAI-style)AQ[A-Za-z0-9_-]{20,}(Gemini-style)xox[baprs]-[A-Za-z0-9-]+(Slack)Bearer\s+[A-Za-z0-9._-]+REDACTED_SECRET(or skip vector-indexing that memory, or flag it for review).ov.conf: pattern list + on/off, so users can extend for internal key shapes; default conservative to avoid false-positive scrubbing of legitimate tokens / UUIDs / commit SHAs.Scope
Extraction layer only. Session-layer raw capture (
messages.jsonl) intentionally unchanged — redacting there would break the raw-capture contract and make raw logs useless for debugging.Caveat
Secret detection has false-positive risk (commit SHAs, content hashes, long UUIDs can look high-entropy). Suggest opt-in initially + an allowlist for known-safe patterns, so legitimate identifiers are not mangled.
Why I'm filing
I have two open PRs in the auto-capture / compaction direction (#2874, #2853). This issue is the privacy complement to that storage work — reducing what gets captured/stored is half the story; ensuring secrets don't survive curation into vector-retrievable memory is the other half. Happy to contribute a PR if there's appetite for the approach above.