Skip to content

Jack-Byrne/code-indexer-mcp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

code-indexer

Index a codebase into a SQLite property graph and ChromaDB embeddings, with a FastMCP server for agents.

Quick start

From any project root (including this repo after pip install -e ".[local-embed]" or pip install -e .):

cd /path/to/repo
pip install -e ".[local-embed]"   # optional: local embeddings (sentence-transformers)
# Index (pick one embedding mode):
code-indexer index --root . --embedding-provider local --embedding-model all-MiniLM-L6-v2
# or, with API embeddings:
# export OPENAI_API_KEY=...
# code-indexer index --root . --embedding-provider api --embedding-model text-embedding-3-small

code-indexer explore --root .     # human-readable summary of graph + chroma paths
code-indexer status --root .      # manifest JSON (file hashes, last commit if git)
code-indexer search "authentication middleware" --root .
code-indexer mcp --root .         # stdio MCP server (sets CODE_INDEXER_ROOT for this process)

Artifacts are written to .code-indexer/ under --root (ignored by git in this project).

Environment

  • OPENAI_API_KEY — for API embeddings (--embedding-provider api) and LLM summaries (--llm).

Artifact layout

.code-indexer/ under the repo root (default):

  • graph.sqlite — nodes and edges
  • chroma/ — persistent Chroma data
  • manifest.json — file hashes for incremental index
  • config.json — resolved config snapshot from last index (API keys redacted)

MCP (FastMCP / stdio)

Point CODE_INDEXER_ROOT at the repo you indexed, then run:

export CODE_INDEXER_ROOT=/path/to/repo
code-indexer mcp

Tools include code_search, similar_tests, get_symbol, find_symbols, outline_file, outline_component, get_callers, get_callees, trace_flow, index_status, and index_refresh.

Tests

pip install -e ".[dev,local-embed]"
pytest tests/ -q                    # includes integration test (indexes repo + MCP stdio)
pytest tests/ -q -m "not integration" # skip slow MCP integration

Inspecting the index (not a black box)

Everything lives under your artifact directory (default .code-indexer/ next to the repo root).

  1. CLI summary — counts, file paths, and sample nodes/edges:

    code-indexer explore --root /path/to/repo
    code-indexer explore --root . --json              # full JSON (includes Chroma `peek`)
    code-indexer explore --root . --export-graph graph.json
  2. SQLite graph — open graph.sqlite in DB Browser for SQLite or the sqlite3 shell. Tables: nodes, edges. Example:

    sqlite3 .code-indexer/graph.sqlite "SELECT kind, COUNT(*) FROM nodes GROUP BY kind;"
    sqlite3 .code-indexer/graph.sqlite "SELECT id, kind, path, name FROM nodes LIMIT 20;"
  3. Chroma vectors — folder chroma/ is Chroma’s on-disk store; collection name is repo_id__embedding_model_id (see explore output). Use code-indexer explore --json to see peek sample rows, or query from Python with chromadb.PersistentClient(path=".../chroma").

  4. Graph visualization — export JSON and convert to Graphviz/D3/etc.:

    code-indexer explore --root . --export-graph /tmp/graph.json

    The file has nodes and edges with ids you can feed to graph layout tools.

Roadmap / backlog

See TICKETS.md for remaining plan items (hybrid search, git diff optimization, flow summaries, etc.) and vectorized LLM summaries for semantic search over plain-language descriptions.

About

Index a codebase into SQLite + Chroma; FastMCP server for agents

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages