Skip to content

Latest commit

 

History

History
 
 

README.md

Code Examples

This section provides practical code examples for common SDK use cases. All examples are available as runnable scripts in the samples/ directory.

Quick Reference

Sample Description
benchmark_evaluation.py Run a model against a benchmark, wait for completion, retrieve results
quickstart.py Minimal end-to-end trace evaluation
async_workflow.py Full async evaluation workflow with concurrent operations
async_results.py Fetch results for multiple evaluations concurrently
model_benchmark_management.py Filter models by name/company/region, add/remove from project
evaluation_filtering.py Sort and filter evaluations by status, accuracy, date
compare_evaluations.py Compare two models on a benchmark with outcome filtering
paginated_results.py Paginate through results or fetch all at once
custom_model.py Register a custom model with an OpenAI-compatible API
custom_benchmark.py Create custom and smart benchmarks from data files
create_judge.py Create, list, update, and delete judges
basic_trace.py Upload, list, get, and delete traces
trace_evaluation.py Run judges on traces, estimate cost, get results with steps
judge_optimization.py Estimate, run, and apply judge optimizations
public_catalog.py Browse public models, benchmarks, evaluations, and prompts
integration_management.py List, inspect, and test configured integrations

Guides

For the complete samples catalog including industry solutions, OpenClaw agent evaluation, CI/CD integration, and more, see the Samples Guide.