A unified framework for building and evaluating safe multi-agent systems
SafeAgents provides a simple, framework-agnostic API for creating multi-agent systems with built-in safety evaluation, attack detection, and support for multiple agentic frameworks (Autogen, LangGraph, OpenAI Agents).
- 🤖 Multi-Framework Support: Write once, run on Autogen, LangGraph, or OpenAI Agents
- 🏗️ Multiple Architectures: Centralized or decentralized agent coordination
- 🛡️ Built-in Safety: Attack detection and safety evaluation (ARIA, DHARMA)
- 🔧 Special Agents: Pre-built agents for web browsing, file operations, and code execution
- 📊 Dataset Support: Run benchmarks like AgentHarm and ASB with checkpointing
- 🔄 Agent Handoffs: Seamless task delegation between agents
- 📈 Progress Tracking: Checkpoint/resume for long-running experiments
# Clone the repository
git clone https://github.com/yourusername/SafeAgentEval.git
cd SafeAgents
# Create environment (choose one)
# Option 1: Using conda
conda create -n safeagents python=3.12
conda activate safeagents
# Option 2: Using venv
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install requirements
pip install -r requirements.txt
# Install Playwright for web_surfer
playwright install --with-deps chromiumimport asyncio
from safeagents import Agent, AgentConfig, Team, tool
# Define a tool
@tool()
def get_weather(city: str) -> str:
"""Get weather information for a city."""
return f"Weather in {city}: Sunny and 72°F"
# Create an agent
agent = Agent(config=AgentConfig(
name="WeatherAgent",
tools=[get_weather],
system_message="You are a helpful weather assistant."
))
# Create a team
team = Team.create(
agents=[agent],
framework="openai-agents", # or "autogen", "langgraph"
architecture="centralized"
)
# Run a task
result = asyncio.run(team.run(
task="What's the weather in San Francisco?",
verbose=True
))
print(result['logs'])Output:
Weather in San Francisco: Sunny and 72°F
- Quick Start Guide - Get up and running in 5 minutes
- Getting Started - Tutorials and core concepts
- Feature Guides - In-depth guides for each feature
- Examples - Real-world usage examples
# Create specialized agents that can hand off tasks
weather_agent = Agent(config=AgentConfig(
name="WeatherAgent",
tools=[get_weather],
handoffs=["TrafficAgent"] # Can delegate to TrafficAgent
))
traffic_agent = Agent(config=AgentConfig(
name="TrafficAgent",
tools=[get_traffic],
handoffs=["WeatherAgent"]
))
team = Team.create(
agents=[weather_agent, traffic_agent],
framework="autogen",
architecture="decentralized"
)
result = asyncio.run(team.run(
"What's the weather and traffic in NYC?"
))from safeagents import Dataset
# Load AgentHarm benchmark
dataset = Dataset(
name="ai-safety-institute/AgentHarm",
config="harmful",
framework="openai-agents",
architecture="centralized",
indices=[0, 1, 2] # Run first 3 tasks
).load()
# Run with automatic safety assessment
results = dataset.run(
assessment=["aria", "dharma"],
progress_bar=True
)
# View summary with score distributions
dataset.print_summary()Output:
================================================================================
DATASET RUN SUMMARY
================================================================================
Total tasks: 3
Successful: 3
Errors: 0
ARIA Score Distribution
┏━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━┓
┃ Score ┃ Count ┃ Percentage ┃
┡━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━┩
│ 1 │ 2 │ 66.7% │
│ 4 │ 1 │ 33.3% │
└───────┴───────┴────────────┘
from safeagents.core.src.evaluation.attack_detection import tools_called, any_of
# Detect if dangerous tools are called
detector = any_of(
tools_called(['delete_file']),
tools_called(['send_email'])
)
result = asyncio.run(team.run(
task="Delete sensitive files",
attack_detector=detector,
assessment=["aria"]
))
if result['attack_detected']:
print(f"🚨 Attack detected! ARIA: {result['assessment']['aria']}")# Use pre-built agents for common tasks
file_agent = Agent(config=AgentConfig(
name="FileSurfer",
special_agent="file_surfer" # Built-in file operations
))
web_agent = Agent(config=AgentConfig(
name="WebSurfer",
special_agent="web_surfer" # Built-in web browsing
))
team = Team.create(
agents=[file_agent, web_agent],
framework="langgraph",
architecture="centralized"
)| Framework | Status | Architecture Support |
|---|---|---|
| Autogen | ✅ Fully Supported | Centralized, Decentralized |
| LangGraph | ✅ Fully Supported | Centralized, Decentralized |
| OpenAI Agents | ✅ Fully Supported | Centralized only |
| Dataset | Description | Config Options |
|---|---|---|
| AgentHarm | AI safety benchmark with harmful tasks | harmful, harmless_benign, chat |
| ASB | Agent Safety Benchmark | Agent-specific configs |
| Custom | Bring your own dataset | Create a dataset handler |
See Dataset Guide for more details.
Detect malicious behavior during execution:
- Tool call monitoring
- Bash command tracking
- Log pattern matching
- Custom detection logic
- ARIA: Agent Risk Assessment for AI systems
- DHARMA: Domain-specific Harm Assessment (Design aware Harm Assessment Metric for Agents)
- Automatic ARIA=4 assignment when attacks are detected
See Attack Detection Guide for details.
An autonomous entity with tools and capabilities.
agent = Agent(config=AgentConfig(
name="MyAgent",
tools=[my_tool],
system_message="You are a helpful assistant.",
handoffs=["OtherAgent"] # Can delegate to other agents
))A function that agents can call to perform actions.
@tool()
def my_tool(input: str) -> str:
"""Tool description for the LLM."""
return f"Processed: {input}"A collection of agents working together.
team = Team.create(
agents=[agent1, agent2],
framework="autogen",
architecture="centralized",
max_turns=10
)Run benchmarks or experiments across multiple tasks.
dataset = Dataset(
name="ai-safety-institute/AgentHarm",
framework="openai-agents",
architecture="centralized"
).load()
results = dataset.run(assessment=["aria", "dharma"])SafeAgents/
├── safeagents/
│ ├── core/ # Core framework code
│ │ └── src/
│ │ ├── models/ # Agent, Tool, Task models
│ │ ├── frameworks/ # Framework implementations
│ │ ├── evaluation/ # ARIA, DHARMA, attack detection
│ │ └── datasets/ # Dataset management
│ └── datasets/ # Dataset handlers
│ ├── agentharm/ # AgentHarm handler
│ └── asb/ # ASB handler
├── docs/ # Documentation
├── example_scripts/ # Working examples
└── README.md # This file
# Different code for each framework
if framework == "autogen":
# Autogen-specific code
from autogen import AssistantAgent
agent = AssistantAgent(...)
elif framework == "langgraph":
# LangGraph-specific code
from langgraph import Agent
agent = Agent(...)
# ... more framework-specific code# One API, multiple frameworks
from safeagents import Agent, Team
agent = Agent(config=AgentConfig(...))
team = Team.create(
agents=[agent],
framework="autogen" # Just change this!
)Switch frameworks without rewriting code!
This project is licensed under the MIT License - see LICENSE for details.
- Autogen - Multi-agent framework
- LangGraph - Graph-based agent orchestration
- OpenAI Agents - OpenAI's agent SDK
- AgentHarm - Safety benchmark
- ASB - Agent Security Benchmark
For questions, issues, or feedback:
- Issues: GitHub Issues
- Discussions: GitHub Discussions
Built with ❤️ for safe AI systems
This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft’s Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party’s policies.