Skip to content

learnwithparam/production-agent-seven-layers

Repository files navigation

Production Agent: Seven Layers

learnwithparam.com

Build an LLM agent the way real teams ship them: as seven composable layers, each with a clear job, a clear contract, and a clear failure mode. Every request flows through the same pipeline and emits a per-layer trace so you can debug production in minutes, not hours.

Start learning at learnwithparam.com. Regional pricing available with discounts of up to 60%.

What You'll Learn

  • Design an agent as a graph of small, testable layers instead of one giant function
  • Add transport validation and rate limiting before anything reaches your LLM
  • Route intents with a tiny state machine and branch to tools, retrieval, or direct reply
  • Run tools safely with whitelists and argument parsing
  • Keep per-thread memory with sane eviction limits
  • Ground answers with ChromaDB semantic search when retrieval beats generation
  • Enforce guardrails on both input and output (PII scrub, length caps)
  • Emit structured traces and logs so every request is observable out of the box

The Seven Layers

  1. Transport - validates the request, enforces size and basic rate limits
  2. Orchestrator - a small state machine that picks tool_use, retrieve, or reply
  3. Tools - get_time and calculator, dispatched through a safe registry
  4. Memory - per-thread bounded conversation history
  5. Retrieval - ChromaDB semantic search over a seed knowledge base (degrades gracefully when disabled)
  6. Guardrails - PII scrubbing and length checks on every input and output
  7. Observability - per-request trace with per-layer timings plus a structlog event

Tech Stack

  • FastAPI - async Python web framework
  • Pydantic - request and response validation
  • ChromaDB + Sentence Transformers - embedded vector store and local embeddings
  • structlog - structured logs for every request
  • LLM Provider Pattern - supports OpenRouter, Fireworks, Gemini, OpenAI
  • Docker - containerized development

Getting Started

Prerequisites

  • Python 3.11+
  • uv (installed automatically by make setup)
  • An API key from any supported LLM provider

Quick Start

make dev

# Or step by step:
make setup
# edit .env and add your API key
make run

With Docker

make build
make up
make logs
make down

API Documentation

Once running, open http://localhost:8000/docs for the interactive Swagger UI.

Primary endpoints:

  • GET /production-agent/health - liveness check plus the list of layers
  • POST /production-agent/chat - body { "message": "...", "thread_id": "t1" }
  • GET /production-agent/trace/{thread_id} - the most recent per-layer trace for a thread

Challenges

Work through these incrementally to build the full system:

  1. The Transport Layer - Validate inputs and add a tiny rate limiter
  2. The Orchestrator - Route between tool use, retrieval, and direct reply
  3. Tools with Guardrails - Register get_time and a safe calculator
  4. Thread Memory - Store bounded per-thread conversation history
  5. Retrieval Layer - Seed a ChromaDB collection and wire up semantic search
  6. Input and Output Guardrails - Scrub emails, phone numbers, and SSN-like tokens
  7. Observability - Build the per-request trace and expose GET /trace/{thread_id}

Makefile Targets

make help           Show all available commands
make setup          Initial setup (create .env, install deps)
make dev            Setup and run (one command!)
make run            Start FastAPI server
make build          Build Docker image
make up             Start container
make down           Stop container
make clean          Remove venv and cache

Learn more

About

Production Agent: Seven Layers

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors