Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
67 commits
Select commit Hold shift + click to select a range
595423d
Add benchmark capabilities for ops.
neoblizz Feb 3, 2026
8c965a1
Merge branch 'main' into neoblizz/iris-xops-perf
neoblizz Feb 7, 2026
ef227b0
Merge conflicts.
neoblizz Feb 7, 2026
f132ceb
Up the tritonBLAS commit.
neoblizz Feb 7, 2026
1628a61
...
neoblizz Feb 10, 2026
c26e872
Apply Ruff auto-fixes
github-actions[bot] Feb 10, 2026
3d4c7d7
Fix load vectorization and transpose config
ryanswann-amd Feb 11, 2026
5b02211
Apply Ruff auto-fixes
github-actions[bot] Feb 11, 2026
4c3b3f4
Add HBM buffered version
ryanswann-amd Feb 11, 2026
a301392
Merge branch 'ryaswann/iris_xops_perf' of github.com:ROCm/iris into r…
ryanswann-amd Feb 11, 2026
1f3b9ef
Apply Ruff auto-fixes
github-actions[bot] Feb 11, 2026
45288ff
Use workgroup specialized variant
ryanswann-amd Feb 13, 2026
b2aadcd
Apply Ruff auto-fixes
github-actions[bot] Feb 13, 2026
7b2321e
Update hbm buffered all gather matmul
ryanswann-amd Feb 16, 2026
a4d845f
Merge branch 'ryaswann/iris_xops_perf' of github.com:ROCm/iris into r…
ryanswann-amd Feb 16, 2026
9692222
Apply Ruff auto-fixes
github-actions[bot] Feb 16, 2026
44ebc97
Add tracing
ryanswann-amd Feb 16, 2026
0c2842e
Merge branch 'ryaswann/iris_xops_perf' of github.com:ROCm/iris into r…
ryanswann-amd Feb 17, 2026
11d017a
Apply Ruff auto-fixes
github-actions[bot] Feb 17, 2026
ace40d0
Add stages to all_gather_matmul_hbm_buffer
ryanswann-amd Feb 17, 2026
950c3a0
Merge branch 'ryaswann/iris_xops_perf' of github.com:ROCm/iris into r…
ryanswann-amd Feb 17, 2026
f7612bd
Apply Ruff auto-fixes
github-actions[bot] Feb 17, 2026
51bccb5
Updates to benchmark and kernel
ryanswann-amd Feb 17, 2026
9b71523
Merge branch 'ryaswann/iris_xops_perf' of github.com:ROCm/iris into r…
ryanswann-amd Feb 17, 2026
cbe2aff
Apply Ruff auto-fixes
github-actions[bot] Feb 17, 2026
11d9001
Add predictive params, fix pointer overflows, fix race conditions
Mar 3, 2026
3c4cb4d
Apply Ruff auto-fixes
github-actions[bot] Mar 3, 2026
f2f755a
Merge branch 'neoblizz/iris-xops-perf' into ryaswann/iris_xops_perf
ryanswann-amd Mar 3, 2026
77eff5b
Reverse 2D block translate
Mar 3, 2026
dcafd2a
Properly use iris tracing APIs
Mar 3, 2026
6fdad6d
Apply Ruff auto-fixes
github-actions[bot] Mar 3, 2026
08755b7
Remove test.sh
Mar 3, 2026
88f7767
All gather matmul with improved performance. (#415)
ryanswann-amd Mar 5, 2026
f558293
Fix CI: restore vectorization hints, align tritonBLAS versions, remov…
ryanswann-amd Mar 6, 2026
e5dd77f
Merge main into neoblizz/iris-xops-perf
ryanswann-amd Mar 6, 2026
477b472
Fix CI: increase default N to match FusedConfig block_size_n=256
ryanswann-amd Mar 6, 2026
76cc30d
Revert "Fix CI: increase default N to match FusedConfig block_size_n=…
ryanswann-amd Mar 6, 2026
9743b13
Remove unnecessary block size assertions — Triton handles masking
ryanswann-amd Mar 6, 2026
a86dc04
Initial plan
Copilot Mar 11, 2026
445b25c
Add vectorization hints and tests for HBM buffer all-gather matmul
Copilot Mar 12, 2026
2f0099f
Add vectorization hints and tests for HBM buffer all-gather matmul (#…
ryanswann-amd Mar 12, 2026
39c213d
Merge branch 'main' into neoblizz/iris-xops-perf
ryanswann-amd Mar 16, 2026
bad3422
Initial plan for PR cleanup
Copilot Apr 8, 2026
2a9f31a
Cleanup PR: address reviewer feedback
Copilot Apr 8, 2026
98d25bf
Clarify bias handling in matmul_reduce_scatter: raise NotImplementedE…
Copilot Apr 8, 2026
196bef7
Merge branch 'main' into neoblizz/iris-xops-perf
Copilot Apr 8, 2026
f4b4e75
Sync with main, remove unneeded scripts, minimize PR footprint
Copilot Apr 8, 2026
9d29d8c
Port HBM buffer benchmark to iris.bench, remove helper scripts
Copilot Apr 8, 2026
2c8b226
Replace shmem with ctx in hbm_buffer kernel and tests
Copilot Apr 9, 2026
1f7f6f1
Updated copilot instructions: you have GPUs, use them
mawad-amd Apr 9, 2026
9999273
Add benchmark comparison plots for HBM buffer vs baseline
Copilot Apr 9, 2026
e6b7114
Merge benchmarks and tests, remove dead code
Copilot Apr 9, 2026
5fac461
Update benchmark comparison plots with MxNxK x-axis labels
Copilot Apr 9, 2026
184331c
Extend trace events with categorized ID ranges and fix tracing abuse
mawad-amd Apr 9, 2026
1b6df88
Apply Ruff auto-fixes
github-actions[bot] Apr 9, 2026
6b70059
Bump trace schema version to 1.2 for new event categories
mawad-amd Apr 9, 2026
8607e38
Add RCCL baseline and rename algorithms to one_shot/prefetch
mawad-amd Apr 9, 2026
63c978b
Fix RCCL benchmark: use regular CUDA memory, not iris symmetric heap
mawad-amd Apr 9, 2026
6a8ad6b
Fix RCCL benchmark: use dist.get_world_size() instead of ctx
mawad-amd Apr 9, 2026
292ee11
Update HBM buffer kernel defaults and benchmark for parameter sweep
Copilot Apr 9, 2026
6979787
Update benchmark plots with new vs previous defaults comparison
Copilot Apr 9, 2026
02ea2b6
Fix preamble FusedConfig() defaults and add shape-adaptive auto-config
ryanswann-amd Apr 11, 2026
64a631f
Fix collective ordering deadlock in fd_passing at ws<8
ryanswann-amd Apr 11, 2026
7d3f476
Apply Ruff auto-fixes
github-actions[bot] Apr 11, 2026
ef0a173
Port auto-config system from ryanswann-amd/iris feature/auto-config-x…
Copilot Apr 15, 2026
2528e8e
Add docs/benchmark-results/ to .gitignore
Copilot Apr 15, 2026
caed8a5
Remove accidentally committed .github/agents and benchmark images
Copilot Apr 15, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
63 changes: 25 additions & 38 deletions .github/copilot-instructions.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,61 +12,48 @@ Iris is a Triton-based framework for Remote Memory Access (RMA) operations on AM

**FOLLOW THESE INSTRUCTIONS EXACTLY. Reference these instructions first before using search or bash commands.**

## Prerequisites
## GPU Environment

- **GPU**: AMD GPUs with ROCm compatibility (tested on MI300X, MI350X & MI355X)
> **Note**: See below for instructions on development without AMD GPU access
- **ROCm/HIP Toolkit**: Required for building C++/HIP components
- **Docker/Apptainer**: Recommended for containerized development
**You have AMD GPU access.** Your runner has 8x MI325X GPUs (gfx942) with ROCm installed. Do NOT skip GPU operations or say you lack GPU access. You can and should:

## Build
- Run benchmarks and post results (charts, tables) in PR comments
- Run tests with `torchrun --nproc_per_node=N` for multi-GPU tests
- Use `rocm-smi` to verify GPU status
- Run `python -c "import torch; print(torch.cuda.device_count())"` to confirm GPU count

### Docker Development Environment (Recommended)
When asked to run a benchmark, **run it and post the output**. Do not say you cannot.

### Running multi-GPU tests and benchmarks

Multi-GPU tests require `torch.distributed` initialization before pytest:
```bash
# Build and start development container (takes 45-60 minutes - NEVER CANCEL)
docker compose up --build -d
# Single GPU
pytest tests/unittests/ -v --tb=short

# Attach to running container
docker attach iris-dev
# Multi-GPU (N = number of GPUs)
torchrun --nproc_per_node=N -m pytest tests/ -v --tb=short

# Install Iris in development mode
cd iris && pip install -e ".[dev]"
# Benchmarks use iris.bench framework
torchrun --nproc_per_node=8 benchmark/ops/bench_<name>.py
```

### Alternative Docker Setup
```bash
# Build Docker image manually
./docker/build.sh <image-name> # Takes 45-60 minutes
### iris.bench framework

# Run container
./docker/run.sh <image-name>
Benchmarks use the declarative `iris.bench` framework. See existing `benchmark/ops/bench_*.py` files for examples. Output includes latency, throughput, and bandwidth tables. When posting benchmark results in PR comments, format as markdown tables.

# Install Iris
cd iris && pip install -e ".[dev]"
```
## Prerequisites

### Apptainer Setup
```bash
# Build and run Apptainer image
./apptainer/build.sh
./apptainer/run.sh
- **GPU**: AMD GPUs with ROCm compatibility (tested on MI300X, MI325X, MI350X & MI355X)
- **ROCm/HIP Toolkit**: Required for building C++/HIP components
- **Docker/Apptainer**: Recommended for containerized development

# Install Iris
pip install -e ".[dev]"
```
## Build

### Local Development (Not Recommended)
iris is already installed in your environment via `pip install -e .` in the setup steps. You do not need to build or install anything. If you need to reinstall after modifying `setup.py` or C extensions:
```bash
# Requires ROCm/HIP toolkit installation
pip install -e ".[dev]"
```

### Development Without AMD GPU
If you don't have access to AMD GPUs, you can still contribute to the project:
- **Code Editing**: Start editing code directly in your local environment
- **CI Testing**: The project has comprehensive CI pipelines that will test your changes automatically. You can check the CI logs if your changes fail to understand what went wrong.
- **Local Validation**: Run linting and formatting locally: `ruff check . --fix && ruff format .`

## Run

### Testing
Expand Down
8 changes: 7 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,8 @@ omni*.pdf
slurm*.out

*.egg-info
*.backup
*.with_chunked

examples/gemm/results/*
asm/
Expand Down Expand Up @@ -57,4 +59,8 @@ gpucore.*
logs/
*.cap
hsakmt_counters.csv
core
core
.intellikit/
.github/agents/docs/benchmark-results/
.github/agents/
docs/benchmark-results/*.png
Empty file.
Loading
Loading