The LayerLens Python SDK ships with 70+ runnable samples covering every API resource, from a single trace evaluation to enterprise compliance pipelines and multi-agent orchestration. All samples live in the samples/ directory and can be run directly after installing the SDK and setting your API key.
pip install layerlens --index-url https://sdk.layerlens.ai/package
export LAYERLENS_STRATIX_API_KEY=your-api-key
python samples/core/quickstart.pyquickstart.py walks through the complete workflow end-to-end: upload a trace, create a judge, run an evaluation, and retrieve results.
Located in samples/core/. Start here to learn how every LayerLens resource -- traces, judges, evaluations, results, models, and benchmarks -- works individually and together, including async patterns and pagination.
Key samples:
quickstart.py-- Your first evaluation in under 30 linestrace_evaluation.py-- Full trace evaluation lifecyclejudge_optimization.py-- Optimize judge accuracy via automated prompt engineeringevaluation_pipeline.py-- Chain judges, traces, and results into an automated pipelineasync_workflow.py-- Concurrent operations with AsyncStratix
See the Core SDK README for the full list.
Located in samples/industry/. Domain-specific evaluation scenarios with judges tuned for regulated and high-stakes verticals including healthcare, financial services, legal, government, insurance, and retail.
Key samples:
healthcare_clinical.py-- Clinical decision support evaluationfinancial_trading.py-- SOX-aligned trading compliancelegal_contracts.py-- Contract review quality assessment
See the Industry Solutions README for the full list.
Located in samples/cowork/. Patterns for Claude Cowork, Agent Teams, or any multi-agent framework where multiple agents collaborate and each agent's output needs independent quality assessment.
Key samples:
multi_agent_eval.py-- Generator-Evaluator patterncode_review.py-- Instrumentor-Reviewer patternrag_assessment.py-- RAG quality evaluation
See the Multi-Agent README for the full list.
Located in samples/cicd/. Embed evaluation quality gates into your build and deployment pipelines so regressions never reach production.
quality_gate.py-- Gate deployments on evaluation pass ratespre_commit_hook.py-- Catch regressions at commit timegithub_actions_gate.yml-- Drop-in GitHub Actions workflow
See the CI/CD README for details.
Located in samples/integrations/. Trace and evaluate outputs from OpenAI and Anthropic with minimal instrumentation.
openai_traced.py-- Trace an OpenAI completion and evaluate itanthropic_traced.py-- Capture multi-turn Claude conversations
Located in samples/modalities/. Apply specialized judges to different content types -- text responses, brand assets, and structured documents.
text_evaluation.py-- Score text across safety, relevance, and compliancebrand_evaluation.py-- Enforce brand voice consistencydocument_evaluation.py-- Validate document extraction accuracy
Located in samples/openclaw/. Trace, evaluate, and monitor OpenClaw autonomous AI agents using LayerLens -- including cage match model tournaments, code gating, drift detection, content auditing, honeypot skill auditing, and adversarial red-teaming.
See the OpenClaw README for the full list of integration samples and advanced evaluation patterns.
Located in samples/mcp/. Expose LayerLens capabilities as tools for Claude, Cursor, and any MCP-compatible AI assistant.
layerlens_server.py-- MCP server with trace management, judge creation, and evaluation execution
See the MCP README for setup instructions.
Located in samples/copilotkit/. A full-stack canvas + chat sample built on langchain.agents.create_agent + CopilotKitMiddleware, with a runnable Next.js 16 + Tailwind 4 + shadcn/ui demo app under app/. The pattern mirrors CopilotKit's own coagents-research-canvas reference: state-driven cards on the host page, a chat sidebar with a frontend HITL widget, and out-of-band polling for long-running async work.
agents/evaluator_agent.py-- LangGraph agent with four backend tools (list_recent_traces,list_judges,run_trace_evaluation,get_evaluation_result) and a frontend HITL tool (confirm_judge) for picking which judge to apply. The picker is a real React widget registered viauseCopilotAction({ renderAndWaitForResponse }), bridged into the LLM's toolbelt byCopilotKitMiddleware-- nointerrupt()call.agents/investigator_agent.py-- Standalone proceduralStateGraphfor trace investigation (errors / latency / cost hot spots). No HITL, no LLM. Reference for non-conversational agents.components/*.tsx-- Five reusable SDK card components (EvaluationCard,TraceCard,JudgeVerdictCard,MetricCard,ComplianceCard) plusMarkdownLite, re-exported as@layerlens/copilotkit-cards.app/-- Runnable Next.js + FastAPI demo. Real LayerLens only -- a missingLAYERLENS_STRATIX_API_KEYis a hard error at startup.
Checkpointer note: The evaluator graph is compiled with
InMemorySaversoag_ui_langgraph's endpoint can callgraph.aget_state(config)per request -- without it the AG-UI handler errors with "No checkpointer set" before any tool runs. The sample shipsInMemorySaverfor zero-setup local development; production deployments should swap to a durable saver (Postgres / SQLite / Redis / LangGraph Platform). See the sample's README for the full architecture walkthrough.
See the CopilotKit README for the full list.
Located in samples/claude-code/. Slash commands that bring LayerLens workflows directly into the Claude Code CLI -- manage traces, judges, evaluations, optimizations, benchmarks, and investigations without leaving your terminal.
See the Claude Code Skills README for the full list.
Located in samples/data/. Pre-built trace files, test datasets, and 16 industry-specific evaluation datasets so you can run every sample without generating your own data first.
See the Sample Data README for contents.
For the complete table of every sample with descriptions, see the samples README.