Samples Guide

The LayerLens Python SDK ships with 70+ runnable samples covering every API resource, from a single trace evaluation to enterprise compliance pipelines and multi-agent orchestration. All samples live in the samples/ directory and can be run directly after installing the SDK and setting your API key.

Quick Start

pip install layerlens --index-url https://sdk.layerlens.ai/package
export LAYERLENS_STRATIX_API_KEY=your-api-key
python samples/core/quickstart.py

quickstart.py walks through the complete workflow end-to-end: upload a trace, create a judge, run an evaluation, and retrieve results.

Samples by Category

Core SDK Operations (18 samples)

Located in samples/core/. Start here to learn how every LayerLens resource -- traces, judges, evaluations, results, models, and benchmarks -- works individually and together, including async patterns and pagination.

Key samples:

quickstart.py -- Your first evaluation in under 30 lines
trace_evaluation.py -- Full trace evaluation lifecycle
judge_optimization.py -- Optimize judge accuracy via automated prompt engineering
evaluation_pipeline.py -- Chain judges, traces, and results into an automated pipeline
async_workflow.py -- Concurrent operations with AsyncStratix

See the Core SDK README for the full list.

Industry Solutions (10 samples)

Located in samples/industry/. Domain-specific evaluation scenarios with judges tuned for regulated and high-stakes verticals including healthcare, financial services, legal, government, insurance, and retail.

Key samples:

healthcare_clinical.py -- Clinical decision support evaluation
financial_trading.py -- SOX-aligned trading compliance
legal_contracts.py -- Contract review quality assessment

See the Industry Solutions README for the full list.

Multi-Agent Evaluation (5 samples)

Located in samples/cowork/. Patterns for Claude Cowork, Agent Teams, or any multi-agent framework where multiple agents collaborate and each agent's output needs independent quality assessment.

Key samples:

multi_agent_eval.py -- Generator-Evaluator pattern
code_review.py -- Instrumentor-Reviewer pattern
rag_assessment.py -- RAG quality evaluation

See the Multi-Agent README for the full list.

CI/CD Integration (2 samples + workflow)

Located in samples/cicd/. Embed evaluation quality gates into your build and deployment pipelines so regressions never reach production.

quality_gate.py -- Gate deployments on evaluation pass rates
pre_commit_hook.py -- Catch regressions at commit time
github_actions_gate.yml -- Drop-in GitHub Actions workflow

See the CI/CD README for details.

LLM Provider Integrations (2 samples)

Located in samples/integrations/. Trace and evaluate outputs from OpenAI and Anthropic with minimal instrumentation.

openai_traced.py -- Trace an OpenAI completion and evaluate it
anthropic_traced.py -- Capture multi-turn Claude conversations

Content-Type Evaluations (3 samples)

Located in samples/modalities/. Apply specialized judges to different content types -- text responses, brand assets, and structured documents.

text_evaluation.py -- Score text across safety, relevance, and compliance
brand_evaluation.py -- Enforce brand voice consistency
document_evaluation.py -- Validate document extraction accuracy

OpenClaw Agent Evaluation (10 demos + skill)

Located in samples/openclaw/. Trace, evaluate, and monitor OpenClaw autonomous AI agents using LayerLens -- including cage match model tournaments, code gating, drift detection, content auditing, honeypot skill auditing, and adversarial red-teaming.

See the OpenClaw README for the full list of integration samples and advanced evaluation patterns.

MCP Server (1 sample)

Located in samples/mcp/. Expose LayerLens capabilities as tools for Claude, Cursor, and any MCP-compatible AI assistant.

layerlens_server.py -- MCP server with trace management, judge creation, and evaluation execution

See the MCP README for setup instructions.

CopilotKit Integration

Located in samples/copilotkit/. A full-stack canvas + chat sample built on langchain.agents.create_agent + CopilotKitMiddleware, with a runnable Next.js 16 + Tailwind 4 + shadcn/ui demo app under app/. The pattern mirrors CopilotKit's own coagents-research-canvas reference: state-driven cards on the host page, a chat sidebar with a frontend HITL widget, and out-of-band polling for long-running async work.

agents/evaluator_agent.py -- LangGraph agent with four backend tools (list_recent_traces, list_judges, run_trace_evaluation, get_evaluation_result) and a frontend HITL tool (confirm_judge) for picking which judge to apply. The picker is a real React widget registered via useCopilotAction({ renderAndWaitForResponse }), bridged into the LLM's toolbelt by CopilotKitMiddleware -- no interrupt() call.
agents/investigator_agent.py -- Standalone procedural StateGraph for trace investigation (errors / latency / cost hot spots). No HITL, no LLM. Reference for non-conversational agents.
components/*.tsx -- Five reusable SDK card components (EvaluationCard, TraceCard, JudgeVerdictCard, MetricCard, ComplianceCard) plus MarkdownLite, re-exported as @layerlens/copilotkit-cards.
app/ -- Runnable Next.js + FastAPI demo. Real LayerLens only -- a missing LAYERLENS_STRATIX_API_KEY is a hard error at startup.

Checkpointer note: The evaluator graph is compiled with InMemorySaver so ag_ui_langgraph's endpoint can call graph.aget_state(config) per request -- without it the AG-UI handler errors with "No checkpointer set" before any tool runs. The sample ships InMemorySaver for zero-setup local development; production deployments should swap to a durable saver (Postgres / SQLite / Redis / LangGraph Platform). See the sample's README for the full architecture walkthrough.

See the CopilotKit README for the full list.

Claude Code Skills (6 skills)

Located in samples/claude-code/. Slash commands that bring LayerLens workflows directly into the Claude Code CLI -- manage traces, judges, evaluations, optimizations, benchmarks, and investigations without leaving your terminal.

See the Claude Code Skills README for the full list.

Sample Data

Located in samples/data/. Pre-built trace files, test datasets, and 16 industry-specific evaluation datasets so you can run every sample without generating your own data first.

See the Sample Data README for contents.

Full Sample Reference

For the complete table of every sample with descriptions, see the samples README.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Samples Guide

Quick Start

Samples by Category

Core SDK Operations (18 samples)

Industry Solutions (10 samples)

Multi-Agent Evaluation (5 samples)

CI/CD Integration (2 samples + workflow)

LLM Provider Integrations (2 samples)

Content-Type Evaluations (3 samples)

OpenClaw Agent Evaluation (10 demos + skill)

MCP Server (1 sample)

CopilotKit Integration

Claude Code Skills (6 skills)

Sample Data

Full Sample Reference

FilesExpand file tree

samples-guide.md

Latest commit

History

samples-guide.md

File metadata and controls

Samples Guide

Quick Start

Samples by Category

Core SDK Operations (18 samples)

Industry Solutions (10 samples)

Multi-Agent Evaluation (5 samples)

CI/CD Integration (2 samples + workflow)

LLM Provider Integrations (2 samples)

Content-Type Evaluations (3 samples)

OpenClaw Agent Evaluation (10 demos + skill)

MCP Server (1 sample)

CopilotKit Integration

Claude Code Skills (6 skills)

Sample Data

Full Sample Reference