Benchmarking code in Iris is currently scattered across benchmark/ and examples/, with each script re-implementing the same logic (warmup loops, synchronization, timing, averaging, printing). Over time this has led to copy-pasted code, inconsistent measurement patterns, and benchmarks that are hard to reuse or automate.
It would be useful to introduce a small, shared benchmarking harness (e.g. iris.bench) that standardizes:
- warmup and iteration handling
- timing and synchronization
- basic statistics (mean / p50 / p99)
- parameter sweeps
- structured result output (e.g. JSON or dict)
This would allow both examples/ and benchmark/ to share the same timing infrastructure, while keeping example code focused on semantics rather than measurement boilerplate.
Example (sketch):
from iris.bench import benchmark
@benchmark(name="gemm_all_scatter", warmup=5, iters=50)
def run(size, world_size):
# setup tensors
# launch Iris kernel
kernel(...)
Internally you can use iris do bench and any code we have. Such a harness would significantly reduce duplicated code, improve maintainability, and make it easier to add consistent benchmarks and eventually integrate CI performance tracking.
Benchmarking code in Iris is currently scattered across
benchmark/andexamples/, with each script re-implementing the same logic (warmup loops, synchronization, timing, averaging, printing). Over time this has led to copy-pasted code, inconsistent measurement patterns, and benchmarks that are hard to reuse or automate.It would be useful to introduce a small, shared benchmarking harness (e.g.
iris.bench) that standardizes:This would allow both
examples/andbenchmark/to share the same timing infrastructure, while keeping example code focused on semantics rather than measurement boilerplate.Example (sketch):
from iris.bench import benchmark
@benchmark(name="gemm_all_scatter", warmup=5, iters=50)
def run(size, world_size):
# setup tensors
# launch Iris kernel
kernel(...)
Internally you can use iris do bench and any code we have. Such a harness would significantly reduce duplicated code, improve maintainability, and make it easier to add consistent benchmarks and eventually integrate CI performance tracking.