Skip to content

Commit 29fa011

Browse files
adamamer20Ben-geopre-commit-ci[bot]
authored
Refactor examples with common interface, plotting and benchmarking (#188)
* Add Sugarscape IG examples with Mesa and Mesa-Frames backends - Implemented a new backend using Mesa with sequential updates in `examples/sugarscape_ig/backend_mesa`. - Created agent and model classes for the Sugarscape simulation, including movement and sugar management. - Added a CLI interface using Typer for running simulations and saving results. - Introduced utility classes for handling simulation results from both Mesa and Mesa-Frames backends. - Added a new backend using Mesa-Frames with parallel updates in `examples/sugarscape_ig/backend_frames`. - Implemented model-level reporters for Gini coefficient and correlations between agent traits. - Included CSV output and plotting capabilities for simulation metrics. * Update .gitignore and pyproject.toml for benchmarks and new dependencies * Update README.md: Remove redundant documentation section on related documentation * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add clarification comment for benchmarking initialization time in _parse_agents function * Refactor run function to use Optional[Path] for results_dir and compute default at call time * update uv.lock * Fix typo in README for results-dir option description * Add user feedback for saved results in run function * Remove unused imports from backend_mesa.py * Add confirmation message for saved CSV results in run function * Remove unnecessary blank line in run function * Remove redundant seed value assignment in run function * Fix model type annotation in AntAgent constructor * Fix hyphenation in README for clarity on agents' population dynamics * Remove unused pandas import from model.py * Enhance legend styling in plot functions for better readability across themes * Enhance run command to support multiple model and agent inputs for improved flexibility * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove unused pandas import from backend_mesa.py * Fix documentation links in agents.py and model.py to point to the correct tutorial path. * Refactor gini function to simplify sugar array sorting * Fix order of exports in plotting.py to include plot_model_metrics * Enhance CLI output for benchmark results and add tests for CSV saving logic * Format code for better readability in benchmark and sugarscape tests * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: Ben Geo Abraham <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
1 parent 172cf28 commit 29fa011

File tree

22 files changed

+3124
-38
lines changed

22 files changed

+3124
-38
lines changed

.gitignore

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -155,9 +155,10 @@ cython_debug/
155155
llm_rules.md
156156
.python-version
157157

158-
benchmarks/results/*
158+
benchmarks/**/results
159+
benchmarks/**/plots
159160
docs/api/_build/*
160161
docs/api/reference/*
161-
examples/**/results/*
162+
examples/**/results
162163
docs/general/**/data_*
163164
docs/site/*

benchmarks/README.md

Lines changed: 84 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,84 @@
1+
# Benchmarks
2+
3+
Performance benchmarks compare Mesa Frames backends ("frames") with classic Mesa ("mesa")
4+
implementations for a small set of representative models. They help track runtime scaling
5+
and regressions.
6+
7+
Currently included models:
8+
9+
- **boltzmann**: Simple wealth exchange ("Boltzmann wealth") model.
10+
- **sugarscape**: Sugarscape Immediate Growback variant (square grid sized relative to agent count).
11+
12+
## Quick start
13+
14+
```bash
15+
uv run benchmarks/cli.py
16+
```
17+
18+
That command (with defaults) will:
19+
20+
- Benchmark both models (`boltzmann`, `sugarscape`).
21+
- Use agent counts 1000, 2000, 3000, 4000, 5000.
22+
- Run 100 steps per simulation.
23+
- Repeat each configuration once.
24+
- Save CSV results and generate plots.
25+
26+
## CLI options
27+
28+
Invoke `uv run benchmarks/cli.py --help` to see full help. Key options:
29+
30+
| Option | Default | Description |
31+
| ------ | ------- | ----------- |
32+
| `--models` | `all` | Comma list or `all`; accepted: `boltzmann`, `sugarscape`. |
33+
| `--agents` | `1000:5000:1000` | Single int or range `start:stop:step`. |
34+
| `--steps` | `100` | Steps per simulation run. |
35+
| `--repeats` | `1` | How many repeats per (model, backend, agents) config. Seed increments per repeat. |
36+
| `--seed` | `42` | Base RNG seed. Incremented by repeat index. |
37+
| `--save / --no-save` | `--save` | Persist per‑model CSVs. |
38+
| `--plot / --no-plot` | `--plot` | Generate scaling plots (PNG + possibly other formats). |
39+
| `--results-dir` | `benchmarks/results` | Root directory that will receive a timestamped subdirectory. |
40+
41+
Range parsing: `A:B:S` includes `A, A+S, ... <= B`. Final value > B is dropped.
42+
43+
## Output layout
44+
45+
Each invocation uses a single UTC timestamp, e.g. `20251016_173702`:
46+
47+
```text
48+
benchmarks/
49+
results/
50+
20251016_173702/
51+
boltzmann_perf_20251016_173702.csv
52+
sugarscape_perf_20251016_173702.csv
53+
plots/
54+
boltzmann_runtime_20251016_173702_dark.png
55+
sugarscape_runtime_20251016_173702_dark.png
56+
... (other themed variants if enabled)
57+
```
58+
59+
CSV schema (one row per completed run):
60+
61+
| Column | Meaning |
62+
| ------ | ------- |
63+
| `model` | Model key (`boltzmann`, `sugarscape`). |
64+
| `backend` | `mesa` or `frames`. |
65+
| `agents` | Agent count for that run. |
66+
| `steps` | Steps simulated. |
67+
| `seed` | Seed used (base seed + repeat index). |
68+
| `repeat_idx` | Repeat counter starting at 0. |
69+
| `runtime_seconds` | Wall-clock runtime for that run. |
70+
| `timestamp` | Shared timestamp identifier for the benchmark batch. |
71+
72+
## Performance tips
73+
74+
- Ensure the environment variable `MESA_FRAMES_RUNTIME_TYPECHECKING` is **unset** or set to `0` / `false` when collecting performance numbers. Enabling it adds runtime type validation overhead and the CLI will warn you.
75+
- Run multiple repeats (`--repeats 5`) to smooth variance.
76+
77+
## Extending benchmarks
78+
79+
To benchmark an additional model:
80+
81+
1. Add or import both a Mesa implementation and a Frames implementation exposing a `simulate(agents:int, steps:int, seed:int|None, ...)` function.
82+
2. Register it in `benchmarks/cli.py` inside the `MODELS` dict with two backends (names must be `mesa` and `frames`).
83+
3. Ensure any extra spatial parameters are derived from `agents` inside the runner lambda (see sugarscape example).
84+
4. Run the CLI to verify new CSV columns still align.

benchmarks/cli.py

Lines changed: 285 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,285 @@
1+
"""Typer CLI for running mesa vs mesa-frames performance benchmarks."""
2+
3+
from __future__ import annotations
4+
5+
from dataclasses import dataclass
6+
from datetime import datetime, timezone
7+
import os
8+
from pathlib import Path
9+
from time import perf_counter
10+
from typing import Literal, Annotated, Protocol, Optional
11+
12+
import math
13+
import polars as pl
14+
import typer
15+
16+
from examples.boltzmann_wealth import backend_frames as boltzmann_frames
17+
from examples.boltzmann_wealth import backend_mesa as boltzmann_mesa
18+
from examples.sugarscape_ig.backend_frames import model as sugarscape_frames
19+
from examples.sugarscape_ig.backend_mesa import model as sugarscape_mesa
20+
from examples.plotting import (
21+
plot_performance as _examples_plot_performance,
22+
)
23+
24+
app = typer.Typer(add_completion=False)
25+
26+
27+
class RunnerP(Protocol):
28+
def __call__(self, agents: int, steps: int, seed: int | None = None) -> None: ...
29+
30+
31+
@dataclass(slots=True)
32+
class Backend:
33+
name: Literal["mesa", "frames"]
34+
runner: RunnerP
35+
36+
37+
@dataclass(slots=True)
38+
class ModelConfig:
39+
name: str
40+
backends: list[Backend]
41+
42+
43+
MODELS: dict[str, ModelConfig] = {
44+
"boltzmann": ModelConfig(
45+
name="boltzmann",
46+
backends=[
47+
Backend(name="mesa", runner=boltzmann_mesa.simulate),
48+
Backend(name="frames", runner=boltzmann_frames.simulate),
49+
],
50+
),
51+
"sugarscape": ModelConfig(
52+
name="sugarscape",
53+
backends=[
54+
Backend(
55+
name="mesa",
56+
runner=lambda agents, steps, seed=None: sugarscape_mesa.simulate(
57+
agents=agents,
58+
steps=steps,
59+
width=int(max(20, math.ceil((agents) ** 0.5) * 2)),
60+
height=int(max(20, math.ceil((agents) ** 0.5) * 2)),
61+
seed=seed,
62+
),
63+
),
64+
Backend(
65+
name="frames",
66+
# Benchmarks expect a runner signature (agents:int, steps:int, seed:int|None)
67+
# Sugarscape frames simulate requires width/height; choose square close to agent count.
68+
runner=lambda agents, steps, seed=None: sugarscape_frames.simulate(
69+
agents=agents,
70+
steps=steps,
71+
width=int(max(20, math.ceil((agents) ** 0.5) * 2)),
72+
height=int(max(20, math.ceil((agents) ** 0.5) * 2)),
73+
seed=seed,
74+
),
75+
),
76+
],
77+
),
78+
}
79+
80+
81+
def _parse_agents(value: str) -> list[int]:
82+
value = value.strip()
83+
if ":" in value:
84+
parts = value.split(":")
85+
if len(parts) != 3:
86+
raise typer.BadParameter("Ranges must use start:stop:step format")
87+
try:
88+
start, stop, step = (int(part) for part in parts)
89+
except ValueError as exc:
90+
raise typer.BadParameter("Range values must be integers") from exc
91+
if step <= 0:
92+
raise typer.BadParameter("Step must be positive")
93+
# We keep start = 0 to benchmark initialization time
94+
if start < 0 or stop <= 0:
95+
raise typer.BadParameter("Range endpoints must be positive")
96+
if start > stop:
97+
raise typer.BadParameter("Range start must be <= stop")
98+
counts = list(range(start, stop + step, step))
99+
if counts[-1] > stop:
100+
counts.pop()
101+
return counts
102+
try:
103+
agents = int(value)
104+
except ValueError as exc: # pragma: no cover - defensive
105+
raise typer.BadParameter("Agent count must be an integer") from exc
106+
if agents <= 0:
107+
raise typer.BadParameter("Agent count must be positive")
108+
return [agents]
109+
110+
111+
def _parse_models(value: str) -> list[str]:
112+
"""Parse models option into a list of model keys.
113+
114+
Accepts:
115+
- "all" -> returns all available model keys
116+
- a single model name -> returns [name]
117+
- a comma-separated list of model names -> returns list
118+
119+
Validates that each selected model exists in MODELS.
120+
"""
121+
value = value.strip()
122+
if value == "all":
123+
return list(MODELS.keys())
124+
# support comma-separated lists
125+
parts = [part.strip() for part in value.split(",") if part.strip()]
126+
if not parts:
127+
raise typer.BadParameter("Model selection must not be empty")
128+
unknown = [p for p in parts if p not in MODELS]
129+
if unknown:
130+
raise typer.BadParameter(f"Unknown model selection: {', '.join(unknown)}")
131+
# preserve order and uniqueness
132+
seen = set()
133+
result: list[str] = []
134+
for p in parts:
135+
if p not in seen:
136+
seen.add(p)
137+
result.append(p)
138+
return result
139+
140+
141+
def _plot_performance(
142+
df: pl.DataFrame, model_name: str, output_dir: Path, timestamp: str
143+
) -> None:
144+
"""Wrap examples.plotting.plot_performance to ensure consistent theming.
145+
146+
The original benchmark implementation used simple seaborn styles (whitegrid / darkgrid).
147+
Our example plotting utilities define a much darker, high-contrast *true* dark theme
148+
(custom rc params overriding bg/fg colors). Reuse that logic here so the
149+
benchmark dark plots match the example dark plots users see elsewhere.
150+
"""
151+
if df.is_empty():
152+
return
153+
stem = f"{model_name}_runtime_{timestamp}"
154+
_examples_plot_performance(
155+
df.select(["agents", "runtime_seconds", "backend"]),
156+
output_dir=output_dir,
157+
stem=stem,
158+
# Prefer more concise, publication-style wording
159+
title=f"{model_name.title()} runtime scaling",
160+
)
161+
162+
163+
@app.command()
164+
def run(
165+
models: Annotated[
166+
str | list[str],
167+
typer.Option(
168+
help="Models to benchmark: boltzmann, sugarscape, or all",
169+
callback=_parse_models,
170+
),
171+
] = "all",
172+
agents: Annotated[
173+
str | list[int],
174+
typer.Option(
175+
help="Agent count or range (start:stop:step)", callback=_parse_agents
176+
),
177+
] = "1000:5000:1000",
178+
steps: Annotated[
179+
int,
180+
typer.Option(
181+
min=0,
182+
help="Number of steps per run.",
183+
),
184+
] = 100,
185+
repeats: Annotated[int, typer.Option(help="Repeats per configuration.", min=1)] = 1,
186+
seed: Annotated[int, typer.Option(help="Optional RNG seed.")] = 42,
187+
save: Annotated[bool, typer.Option(help="Persist benchmark CSV results.")] = True,
188+
plot: Annotated[bool, typer.Option(help="Render performance plots.")] = True,
189+
results_dir: Annotated[
190+
Path | None,
191+
typer.Option(
192+
help=(
193+
"Base directory for benchmark outputs. A timestamped subdirectory "
194+
"(e.g. results/20250101_120000) is created with CSV files at the root "
195+
"and a 'plots/' subfolder for images. Defaults to the module's results directory."
196+
),
197+
),
198+
] = None,
199+
) -> None:
200+
"""Run performance benchmarks for the selected models."""
201+
# Support both CLI (via callbacks) and direct function calls
202+
if isinstance(models, str):
203+
models = _parse_models(models)
204+
if isinstance(agents, str):
205+
agents = _parse_agents(agents)
206+
# Ensure module-relative default is computed at call time (avoids import-time side effects)
207+
if results_dir is None:
208+
results_dir = Path(__file__).resolve().parent / "results"
209+
210+
runtime_typechecking = os.environ.get("MESA_FRAMES_RUNTIME_TYPECHECKING", "")
211+
if runtime_typechecking and runtime_typechecking.lower() not in {"0", "false"}:
212+
typer.secho(
213+
"Warning: MESA_FRAMES_RUNTIME_TYPECHECKING is enabled; benchmarks may run significantly slower.",
214+
fg=typer.colors.YELLOW,
215+
)
216+
rows: list[dict[str, object]] = []
217+
# Single timestamp per CLI invocation so all model results are co-located.
218+
timestamp = datetime.now(timezone.utc).strftime("%Y%m%d_%H%M%S")
219+
# Create unified output layout: <results_dir>/<timestamp>/{CSV files, plots/}
220+
base_results_dir = results_dir
221+
timestamp_dir = (base_results_dir / timestamp).resolve()
222+
plots_subdir: Path = timestamp_dir / "plots"
223+
for model in models:
224+
config = MODELS[model]
225+
typer.echo(f"Benchmarking {model} with agents {agents}")
226+
for agents_count in agents:
227+
for repeat_idx in range(repeats):
228+
run_seed = seed + repeat_idx
229+
for backend in config.backends:
230+
start = perf_counter()
231+
backend.runner(agents_count, steps, run_seed)
232+
runtime = perf_counter() - start
233+
rows.append(
234+
{
235+
"model": model,
236+
"backend": backend.name,
237+
"agents": agents_count,
238+
"steps": steps,
239+
"seed": run_seed,
240+
"repeat_idx": repeat_idx,
241+
"runtime_seconds": runtime,
242+
"timestamp": timestamp,
243+
}
244+
)
245+
# Report completion of this run to the CLI
246+
typer.echo(
247+
f"Completed {backend.name} for model={model} agents={agents_count} steps={steps} seed={run_seed} repeat={repeat_idx} in {runtime:.3f}s"
248+
)
249+
# Finished all runs for this model
250+
typer.echo(f"Finished benchmarking model {model}")
251+
252+
if not rows:
253+
typer.echo("No benchmark data collected.")
254+
return
255+
df = pl.DataFrame(rows)
256+
if save:
257+
timestamp_dir.mkdir(parents=True, exist_ok=True)
258+
for model in models:
259+
model_df = df.filter(pl.col("model") == model)
260+
csv_path = timestamp_dir / f"{model}_perf_{timestamp}.csv"
261+
model_df.write_csv(csv_path)
262+
typer.echo(f"Saved {model} results to {csv_path}")
263+
if plot:
264+
plots_subdir.mkdir(parents=True, exist_ok=True)
265+
for model in models:
266+
model_df = df.filter(pl.col("model") == model)
267+
_plot_performance(model_df, model, plots_subdir, timestamp)
268+
typer.echo(f"Saved {model} plots under {plots_subdir}")
269+
270+
destinations: list[str] = []
271+
if save:
272+
destinations.append(f"CSVs under {timestamp_dir}")
273+
if plot:
274+
destinations.append(f"plots under {plots_subdir}")
275+
276+
if destinations:
277+
typer.echo("Unified benchmark outputs written: " + "; ".join(destinations))
278+
else:
279+
typer.echo(
280+
"Benchmark run completed (save=False, plot=False; no files written)."
281+
)
282+
283+
284+
if __name__ == "__main__":
285+
app()

0 commit comments

Comments
 (0)