SafeAgents ( also called `SafeAgentEval` )

A unified framework for building and evaluating safe multi-agent systems

SafeAgents provides a simple, framework-agnostic API for creating multi-agent systems with built-in safety evaluation, attack detection, and support for multiple agentic frameworks (Autogen, LangGraph, OpenAI Agents).

✨ Key Features

🤖 Multi-Framework Support: Write once, run on Autogen, LangGraph, or OpenAI Agents
🏗️ Multiple Architectures: Centralized or decentralized agent coordination
🛡️ Built-in Safety: Attack detection and safety evaluation (ARIA, DHARMA)
🔧 Special Agents: Pre-built agents for web browsing, file operations, and code execution
📊 Dataset Support: Run benchmarks like AgentHarm and ASB with checkpointing
🔄 Agent Handoffs: Seamless task delegation between agents
📈 Progress Tracking: Checkpoint/resume for long-running experiments

🚀 Quick Start

Installation

# Clone the repository
git clone https://github.com/yourusername/SafeAgentEval.git
cd SafeAgents

# Create environment (choose one)
# Option 1: Using conda
conda create -n safeagents python=3.12
conda activate safeagents

# Option 2: Using venv
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install requirements
pip install -r requirements.txt

# Install Playwright for web_surfer
playwright install --with-deps chromium

Your First Agent (30 seconds)

import asyncio
from safeagents import Agent, AgentConfig, Team, tool

# Define a tool
@tool()
def get_weather(city: str) -> str:
    """Get weather information for a city."""
    return f"Weather in {city}: Sunny and 72°F"

# Create an agent
agent = Agent(config=AgentConfig(
    name="WeatherAgent",
    tools=[get_weather],
    system_message="You are a helpful weather assistant."
))

# Create a team
team = Team.create(
    agents=[agent],
    framework="openai-agents",  # or "autogen", "langgraph"
    architecture="centralized"
)

# Run a task
result = asyncio.run(team.run(
    task="What's the weather in San Francisco?",
    verbose=True
))

print(result['logs'])

Output:

Weather in San Francisco: Sunny and 72°F

📚 Documentation

Quick Start Guide - Get up and running in 5 minutes
Getting Started - Tutorials and core concepts
Feature Guides - In-depth guides for each feature
Examples - Real-world usage examples

🎯 Use Cases

1. Multi-Agent Collaboration

# Create specialized agents that can hand off tasks
weather_agent = Agent(config=AgentConfig(
    name="WeatherAgent",
    tools=[get_weather],
    handoffs=["TrafficAgent"]  # Can delegate to TrafficAgent
))

traffic_agent = Agent(config=AgentConfig(
    name="TrafficAgent",
    tools=[get_traffic],
    handoffs=["WeatherAgent"]
))

team = Team.create(
    agents=[weather_agent, traffic_agent],
    framework="autogen",
    architecture="decentralized"
)

result = asyncio.run(team.run(
    "What's the weather and traffic in NYC?"
))

2. Safety Evaluation on Benchmarks

from safeagents import Dataset

# Load AgentHarm benchmark
dataset = Dataset(
    name="ai-safety-institute/AgentHarm",
    config="harmful",
    framework="openai-agents",
    architecture="centralized",
    indices=[0, 1, 2]  # Run first 3 tasks
).load()

# Run with automatic safety assessment
results = dataset.run(
    assessment=["aria", "dharma"],
    progress_bar=True
)

# View summary with score distributions
dataset.print_summary()

Output:

================================================================================
DATASET RUN SUMMARY
================================================================================
Total tasks: 3
Successful: 3
Errors: 0

   ARIA Score Distribution
┏━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━┓
┃ Score ┃ Count ┃ Percentage ┃
┡━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━┩
│   1   │     2 │      66.7% │
│   4   │     1 │      33.3% │
└───────┴───────┴────────────┘

3. Attack Detection

from safeagents.core.src.evaluation.attack_detection import tools_called, any_of

# Detect if dangerous tools are called
detector = any_of(
    tools_called(['delete_file']),
    tools_called(['send_email'])
)

result = asyncio.run(team.run(
    task="Delete sensitive files",
    attack_detector=detector,
    assessment=["aria"]
))

if result['attack_detected']:
    print(f"🚨 Attack detected! ARIA: {result['assessment']['aria']}")

4. Special Agents

# Use pre-built agents for common tasks
file_agent = Agent(config=AgentConfig(
    name="FileSurfer",
    special_agent="file_surfer"  # Built-in file operations
))

web_agent = Agent(config=AgentConfig(
    name="WebSurfer",
    special_agent="web_surfer"  # Built-in web browsing
))

team = Team.create(
    agents=[file_agent, web_agent],
    framework="langgraph",
    architecture="centralized"
)

🔧 Supported Frameworks

Framework	Status	Architecture Support
Autogen	✅ Fully Supported	Centralized, Decentralized
LangGraph	✅ Fully Supported	Centralized, Decentralized
OpenAI Agents	✅ Fully Supported	Centralized only

📊 Supported Datasets

Dataset	Description	Config Options
AgentHarm	AI safety benchmark with harmful tasks	`harmful`, `harmless_benign`, `chat`
ASB	Agent Safety Benchmark	Agent-specific configs
Custom	Bring your own dataset	Create a dataset handler

See Dataset Guide for more details.

🛡️ Safety Features

Attack Detection

Detect malicious behavior during execution:

Tool call monitoring
Bash command tracking
Log pattern matching
Custom detection logic

Assessment Metrics

ARIA: Agent Risk Assessment for AI systems
DHARMA: Domain-specific Harm Assessment (Design aware Harm Assessment Metric for Agents)
Automatic ARIA=4 assignment when attacks are detected

See Attack Detection Guide for details.

📖 Core Concepts

Agent

An autonomous entity with tools and capabilities.

agent = Agent(config=AgentConfig(
    name="MyAgent",
    tools=[my_tool],
    system_message="You are a helpful assistant.",
    handoffs=["OtherAgent"]  # Can delegate to other agents
))

Tool

A function that agents can call to perform actions.

@tool()
def my_tool(input: str) -> str:
    """Tool description for the LLM."""
    return f"Processed: {input}"

Team

A collection of agents working together.

team = Team.create(
    agents=[agent1, agent2],
    framework="autogen",
    architecture="centralized",
    max_turns=10
)

Dataset

Run benchmarks or experiments across multiple tasks.

dataset = Dataset(
    name="ai-safety-institute/AgentHarm",
    framework="openai-agents",
    architecture="centralized"
).load()

results = dataset.run(assessment=["aria", "dharma"])

🗂️ Project Structure

SafeAgents/
├── safeagents/
│   ├── core/                  # Core framework code
│   │   └── src/
│   │       ├── models/        # Agent, Tool, Task models
│   │       ├── frameworks/    # Framework implementations
│   │       ├── evaluation/    # ARIA, DHARMA, attack detection
│   │       └── datasets/      # Dataset management
│   └── datasets/              # Dataset handlers
│       ├── agentharm/         # AgentHarm handler
│       └── asb/               # ASB handler
├── docs/                      # Documentation
├── example_scripts/           # Working examples
└── README.md                  # This file

🌟 Why SafeAgents?

Before SafeAgents

# Different code for each framework
if framework == "autogen":
    # Autogen-specific code
    from autogen import AssistantAgent
    agent = AssistantAgent(...)
elif framework == "langgraph":
    # LangGraph-specific code
    from langgraph import Agent
    agent = Agent(...)
# ... more framework-specific code

With SafeAgents

# One API, multiple frameworks
from safeagents import Agent, Team

agent = Agent(config=AgentConfig(...))
team = Team.create(
    agents=[agent],
    framework="autogen"  # Just change this!
)

Switch frameworks without rewriting code!

📄 License

This project is licensed under the MIT License - see LICENSE for details.

🙏 Acknowledgments

Autogen - Multi-agent framework
LangGraph - Graph-based agent orchestration
OpenAI Agents - OpenAI's agent SDK
AgentHarm - Safety benchmark
ASB - Agent Security Benchmark

📬 Contact

For questions, issues, or feedback:

Issues: GitHub Issues
Discussions: GitHub Discussions

🚦 Quick Links

Built with ❤️ for safe AI systems

Trademark Notice

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft’s Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party’s policies.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
docs		docs
safeagents		safeagents
.env.example		.env.example
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
QUICKSTART.md		QUICKSTART.md
README.md		README.md
SECURITY.md		SECURITY.md
TRANSPARENCY.md		TRANSPARENCY.md
requirements.in		requirements.in
requirements.txt		requirements.txt

License

microsoft/SafeAgents

Folders and files

Latest commit

History

Repository files navigation

SafeAgents ( also called SafeAgentEval )