An intelligent AI agent that autonomously routes queries between web search and private knowledge bases, built with LangGraph's agentic workflow framework.
This project demonstrates an autonomous research agent that intelligently decides how to answer your questions:
- Public Information β Searches the web using Tavily API
- Private Information β Searches your local documents using RAG (Retrieval-Augmented Generation)
The agent uses a cyclic graph architecture (not a linear chain) to self-correct and re-assess retrieved data before generating final answers, making it more reliable and accurate than traditional chatbots.
- Orchestration Framework: LangGraph (State Machine with Cyclic Graphs)
- Language Model: Google Gemini 2.5 Flash
- Vector Database: ChromaDB with Google Generative AI Embeddings (768-dimensional)
- Web Search Tool: Tavily API (optimized for AI agents)
- Memory System: RAG (Retrieval-Augmented Generation)
User Query
β
[Agent Node] β Analyzes intent
β
[Router Logic] β Conditional edge decides:
βββ Public info? β [Tavily Search Tool]
βββ Private info? β [RAG Retrieval Tool]
β
[Tool Node] β Executes search
β
[Agent Node] β Re-evaluates results (self-correction loop)
β
Final Answer
Key Feature: The agent can loop back to tools multiple times until it has sufficient information, enabling multi-step reasoning and verification.
- Python 3.11+
- Google Gemini API Key
- Tavily API Key
-
Clone the repository
git clone <your-repo-url> cd "Research Agent"
-
Create and activate virtual environment
python3.11 -m venv .venv source .venv/bin/activate # On Windows: .venv\Scripts\activate
-
Install dependencies
Option A: Using
uv(Recommended - Fast & Modern)# Install uv if you haven't already curl -LsSf https://astral.sh/uv/install.sh | sh # Install dependencies uv pip install -r requirements.txt
Option B: Using
pip(Traditional)pip install -r requirements.txt
-
Set up environment variables
Create a
.envfile in the project root:GEMINI_API_KEY=your_gemini_api_key_here TAVILY_API_KEY=your_tavily_api_key_here
-
Test API connections
python test_gemini.py python tools.py
This verifies your Gemini and Tavily API keys are working correctly.
-
Build the vector database
python database.py
This loads
notes.txtand creates a ChromaDB vector database for RAG retrieval. -
Run the agent
python agent.py
Research Agent/
βββ agent.py # Main agent logic with LangGraph workflow
βββ database.py # Vector database setup and document ingestion
βββ tools.py # Tool definitions (Tavily search, RAG retrieval)
βββ test_gemini.py # API connection test
βββ requirements.txt # Python dependencies
βββ notes.txt # Your private knowledge base (customize this!)
βββ chroma_db/ # Vector database storage (auto-generated)
βββ .env # API keys (not tracked in git)
βββ README.MD # This file
You: What's the latest news about AI?
π€ Agent: [Searches web via Tavily and returns current information]
You: What's my project deadline?
π€ Agent: [Searches your notes.txt via RAG and returns stored information]
You: Compare my favorite movie to current box office hits
π€ Agent: [Uses RAG for your favorite, Tavily for current hits, then synthesizes]
Unlike traditional AI agents that run linearly, this project uses LangGraph to create a state machine with loops:
User β LLM β Tool β Answer
User β [Agent Node] β· [Tool Node] β Answer
β____________β
(Self-correction loop)
The agent maintains conversation state using AgentState:
- Tracks all messages (user questions, tool results, agent responses)
- Uses
add_messagesreducer to append new messages without overwriting history - Enables multi-turn conversations with full context
- Edit
notes.txtwith your private information - Run
python database.pyto rebuild the vector database - The agent will now search your custom knowledge base
In agent.py, modify the retriever parameters:
retriever = db.as_retriever(search_kwargs={"k": 3}) # Return top 3 resultsReplace Gemini with another model:
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(model="gpt-4")| Package | Purpose |
|---|---|
langchain |
Core LLM framework |
langgraph |
State machine and agentic workflow orchestration |
langchain-google-genai |
Google Gemini integration |
tavily-python |
Web search optimized for AI agents |
chromadb |
Local vector database for semantic search |
python-dotenv |
Secure API key management |
RAG (Retrieval-Augmented Generation) enables semantic search over your documents:
- Embedding: Text is converted to 768-dimensional vectors
- Storage: Vectors are stored in ChromaDB
- Search: User queries are embedded and compared using cosine similarity
- Retrieval: Most similar documents are returned to the LLM
Example: Searching for "food" will find "pizza" and "burger" even if you didn't type those exact words, because they live in the same semantic "neighborhood."
This project demonstrates:
- β Agentic AI workflows with LangGraph
- β Tool-calling and function execution
- β RAG implementation with vector databases
- β Conditional routing and decision-making
- β State management in AI applications
- β Self-correcting AI systems
MIT License - Feel free to use this project for learning and development.
Contributions welcome! Feel free to:
- Add new tools (e.g., calculator, database queries)
- Improve the routing logic
- Enhance the RAG retrieval quality
- Add conversation memory persistence
Built with β€οΈ using LangGraph and Google Gemini