Skip to content

Latest commit

 

History

History
34 lines (28 loc) · 2.83 KB

File metadata and controls

34 lines (28 loc) · 2.83 KB

Code Examples

This section provides practical code examples for common SDK use cases. All examples are available as runnable scripts in the samples/ directory.

Quick Reference

Sample Description
benchmark_evaluation.py Run a model against a benchmark, wait for completion, retrieve results
quickstart.py Minimal end-to-end trace evaluation
async_workflow.py Full async evaluation workflow with concurrent operations
async_results.py Fetch results for multiple evaluations concurrently
model_benchmark_management.py Filter models by name/company/region, add/remove from project
evaluation_filtering.py Sort and filter evaluations by status, accuracy, date
compare_evaluations.py Compare two models on a benchmark with outcome filtering
paginated_results.py Paginate through results or fetch all at once
custom_model.py Register a custom model with an OpenAI-compatible API
custom_benchmark.py Create custom and smart benchmarks from data files
create_judge.py Create, list, update, and delete judges
basic_trace.py Upload, list, get, and delete traces
trace_evaluation.py Run judges on traces, estimate cost, get results with steps
judge_optimization.py Estimate, run, and apply judge optimizations
public_catalog.py Browse public models, benchmarks, evaluations, and prompts
integration_management.py List, inspect, and test configured integrations

Guides

For the complete samples catalog including industry solutions, OpenClaw agent evaluation, CI/CD integration, and more, see the Samples Guide.