Skip to content

Hnimrama/ix atom#238

Merged
hnimra-amd merged 68 commits into
dev/dtnifrom
hnimrama/IX-atom
Jun 29, 2026
Merged

Hnimrama/ix atom#238
hnimra-amd merged 68 commits into
dev/dtnifrom
hnimrama/IX-atom

Conversation

@hnimra-amd

@hnimra-amd hnimra-amd commented Jun 23, 2026

Copy link
Copy Markdown

PR #238 — InferenceX ATOM W1 (MI300X perf gates)

PR: #238
Head: hnimrama/IX-atomBase: dev/dtni

Jira: AIMVT-236 · AIMVT-244


Motivation

CVS needs a first-class InferenceX ATOM automation path aligned with the DTNI Validation Tracker (IX ATOM) — not the legacy inferencemax_single uplift. This PR:

  • Introduces the inferencex_atom_single suite with an ATOM-native driver (atom.entrypoints.openai_server + atom.benchmarks.benchmark_serving).
  • Ships W1 DeepSeek R1 FP8 variant configs for MI300X and MI355X (perf, smoke, MTP3) with calibrated / CI-seeded thresholds.
  • Closes M1 / Phase A on MI300X: W1 perf with enforce_thresholds: true after lab confirmation.
  • Documents the IX-atom roadmap (W1–W18, accuracy, metric tiers, parity frameworks) in plans/inferencex-atom-cvs-automation-plan.md.

MI355X variant configs ship with enforce_thresholds: false until hardware is available — they do not block merge or MI300X milestone work.

Base branch: dev/dtni.


Technical Details

Suite and orchestration

  • Rename inferencemax_singleinferencex_atom_single; legacy inferencemax/ configs removed.
  • New InferenceXAtomJob (inferencex_atom_orch.py):
    • params.driver=atom → ATOM server + benchmark_serving; JSON artifacts parsed via to_client_metrics.
    • params.driver=vllm retained for interim GPT-OSS uplift variants only.
  • Canonical config layout: flat sibling pairs under cvs/input/config_file/inference/inferencex_atom_single/:
    • {gpu}_inferencex-atom-single_{model}_{precision}[_{mode}]_config.json
    • {gpu}_inferencex-atom-single_{model}_{precision}[_{mode}]_threshold.json
  • schema_version: 1, typed loader, and ix_recipes.json recipe pins (dsr1-fp8-mi300x-atom, etc.).
  • Cluster examples: mi300x_atom_single.json, mi355x_atom_single.json; container names pinned (inferencex_atom_mi300x / inferencex_atom_mi355x).
  • Shared suite helpers: inference_suite_lifecycle.py, inference_suite_results_table.py.

W1 variants shipped

Variant stem Arch enforce_thresholds Notes
mi300x_inferencex-atom-single_deepseek-r1_fp8_perf MI300X true M1 gate — ISL=OSL=1024, CONC=128/256, 1000 prompts
mi300x_inferencex-atom-single_deepseek-r1_fp8_smoke MI300X false 128-prompt pre-gate
mi300x_inferencex-atom-single_deepseek-r1_fp8_mtp3 MI300X false MTP3 recipe
mi355x_inferencex-atom-single_deepseek-r1_fp8_perf MI355X false CI-seeded thresholds (plan Section 4.3)
mi355x_inferencex-atom-single_deepseek-r1_fp8_mtp3 MI355X false CI-seeded

Threshold / metrics plumbing

  • Tiered gates via test_cell_metrics (METRIC_TIERS: throughput, ttft, tpot, health, record) — one pytest row per tier per sweep cell.
  • to_client_metrics: derive client.failed when ATOM omits it (num_prompts - completed), then compute client.success_rate; add client.output_tput_per_gpu.
  • Tpot tier gates p99_tpot_ms (ATOM benchmark_serving emits p99 tails; p95_tpot_ms may be absent even with metric_percentiles: "95,99"). Tier enforcement skips metrics missing from the artifact.
  • MI300X W1 perf thresholds recalibrated from 2026-06-25 lab run (throughput mins = measured × 0.9; latency maxes = measured × 1.1).

Platform / shared infra (supporting changes)

  • cvs/lib/utils/ — shared config_loader, verdict, sweep selector.
  • vllm_single suite refactor to the same lifecycle / metric pattern.
  • Runbook: cvs/input/config_file/inference/inferencex_atom_single/README.md.

Out of scope (follow-up on dev/dtni)

  • M2 gsm8k accuracy
  • M3 P1 workloads (W2, W3, W13, W17)
  • M4 inferencex_atom_vllm_single / inferencex_atom_sglang_single parity frameworks
  • M5 multi-node (prioritized immediately after M4 parity)

Test Plan

CI / unit (no GPU)

  • pytest cvs/tests/inference/inferencex_atom/ -q
  • pytest cvs/lib/inference/unittests/test_inferencex_atom_parsing.py -q
  • pytest cvs/lib/inference/unittests/test_inferencex_atom_config_loader.py -q
  • Config loader / sweep selector / orch parse tests pass

Lab — MI300X smoke

Launcher: CTR-SVDT-L005 (10.7.54.167) · GPU node: 10.245.135.75

cd ~/cvs && make install && source .cvs_venv/bin/activate

SMOKE_DIR=~/input/config_file/inference/inferencex_atom_single/smoke
mkdir -p "$SMOKE_DIR"
cvs copy-config inference/inferencex_atom_single/mi300x_inferencex-atom-single_deepseek-r1_fp8_smoke_config.json \
  --output "$SMOKE_DIR/mi300x_inferencex-atom-single_deepseek-r1_fp8_smoke_config.json"
cvs copy-config inference/inferencex_atom_single/mi300x_inferencex-atom-single_deepseek-r1_fp8_smoke_threshold.json \
  --output "$SMOKE_DIR/mi300x_inferencex-atom-single_deepseek-r1_fp8_smoke_threshold.json"

TS=$(date +%Y%m%d_%H%M%S)
HTML="$HOME/cvs_results/${TS}_ix-atom-smoke_mi300x.html"
LOG="$HOME/cvs_results/${TS}_ix-atom-smoke_mi300x.log"

cvs run inferencex_atom_single \
  --cluster_file ~/input/cluster_file/mi300x_atom_single.json \
  --config_file "$SMOKE_DIR/mi300x_inferencex-atom-single_deepseek-r1_fp8_smoke_config.json" \
  --html="$HTML" --self-contained-html --log-file="$LOG" -vvv -s
  • 11 passed, 0 failed (~13 min)

Lab — MI300X W1 perf (M1 gate)

cd ~/cvs && make install && source .cvs_venv/bin/activate

PERF_DIR=~/input/config_file/inference/inferencex_atom_single/perf
mkdir -p "$PERF_DIR"
cvs copy-config inference/inferencex_atom_single/mi300x_inferencex-atom-single_deepseek-r1_fp8_perf_config.json \
  --output "$PERF_DIR/mi300x_inferencex-atom-single_deepseek-r1_fp8_perf_config.json"
cvs copy-config inference/inferencex_atom_single/mi300x_inferencex-atom-single_deepseek-r1_fp8_perf_threshold.json \
  --output "$PERF_DIR/mi300x_inferencex-atom-single_deepseek-r1_fp8_perf_threshold.json"

TS=$(date +%Y%m%d_%H%M%S)
HTML="$HOME/cvs_results/${TS}_ix-atom-w1-perf_mi300x.html"
LOG="$HOME/cvs_results/${TS}_ix-atom-w1-perf_mi300x.log"

cvs run inferencex_atom_single \
  --cluster_file ~/input/cluster_file/mi300x_atom_single.json \
  --config_file "$PERF_DIR/mi300x_inferencex-atom-single_deepseek-r1_fp8_perf_config.json" \
  --html="$HTML" --self-contained-html --log-file="$LOG" -vvv -s
  • Lifecycle stages pass (container, model fetch, server start, benchmark client, teardown).
  • Both sweep cells: CONC=128, CONC=256 (ISL=OSL=1024, TP=8); server reused across cells.
  • All metric tiers pass under enforce_thresholds: true.
  • HTML report attached to PR.

MI355X

  • Not required for merge — configs ship; flip enforce_thresholds when hardware is available.

Test Result

MI300X lab — mi300x_inferencex-atom-single_deepseek-r1_fp8_perf

  • Branch: hnimrama/IX-atom @ 07c90a7
  • Launcher: CTR-SVDT-L005 (10.7.54.167) · GPU node: 10.245.135.75
  • Install: make install from repo root (.cvs_venv)
  • Image: rocm/atom-dev:latest
  • Model: deepseek-ai/DeepSeek-R1-0528 (FP8, TP=8)
  • Outcome: 17 passed, 0 failed (~22 min)
Cell client.output_throughput (measured) Threshold (min) Result
CONC=128 2867 tok/s 2580 tok/s PASS
CONC=256 4697 tok/s 4227 tok/s PASS
Cell TTFT / TPOT gates Result
CONC=128 mean TTFT 811 ms (max 892); p99 TTFT 6511 ms (max 7162); mean TPOT 42.5 ms; p99 TPOT 46.7 ms (max 51.4) PASS
CONC=256 mean TTFT 728 ms; p99 TPOT 59.7 ms (max 65.6) PASS

MI300X lab — smoke

  • Outcome: 11 passed, 0 failed (~13 min)

Artifacts (attach to PR)

inferencex_atom_single_2026-06-25T175704.zip

Unit tests

  • CI / local unit suite green on PR branch.

Submission Checklist

  • PR open: #238
  • Jira linked: AIMVT-236, AIMVT-244
  • Base branch is dev/dtni
  • Lab run used make install from this branch (not a stale site-packages install)
  • MI300X W1 perf + smoke HTML reports attached
  • enforce_thresholds: true only on MI300X W1 perf — confirmed in lab
  • MI355X variants left at enforce_thresholds: false (pending hardware)
  • Plan doc reviewed for milestone alignment
  • Reviewer aware M2 (gsm8k), M3 (W2/W3/W13/W17), and M5 (multi-node) are follow-ups on dev/dtni

@hnimra-amd hnimra-amd marked this pull request as draft June 23, 2026 23:28
@hnimra-amd hnimra-amd marked this pull request as ready for review June 24, 2026 16:28
hnimra-amd added a commit that referenced this pull request Jun 24, 2026
Stacked PR body covering motivation, technical details, test plan, lab results, and checklist targeting dev/dtni.
Reverts commit 4a8425f, restoring the changes from PR #225 on dev/dtni.
Probe python3.13..python3 for import vllm; export BENCH_PY and BENCH_SCRIPT. Use shlex.quote for docker exec bash -c. Align InferenceMax client completion with Serving Benchmark Result or End-to-end Latency.
Search site-packages and ancestor paths, verify the file is readable, and document vllm[bench] when wheels omit benchmarks/.
Use CVS_GPU_MEMORY_UTIL in sample config and serve script to avoid vLLM unknown-env warnings. Extend default readiness poll budget to 60 and grep full server logs so Uvicorn ready is not missed after long model loads.
Wheels often omit vllm/benchmarks; resolve the driver via eval exports, run python -m vllm.entrypoints.cli.main bench serve when needed, and fail fast on missing-script log patterns in InferenceMax and base polling.
vLLM random workloads scale (ISL+OSL)*(1+r); clamp ratio when it would exceed MML, pass --temperature 0 for greedy parity, and forward --metric-percentiles in InferenceMax and vllm_single clients.
Read client_poll_count and client_poll_wait_time from benchmark_params (defaults 50/60), document them and fix the inferencemax.rst table, and surface the keys in sample MI300X/MI355X configs.
…st polling

Gate benchmark success on Failed requests only after the summary is present;
tail more client log lines for InferenceMax. Variant and benchmark_params accept
bench_max_failed_requests (default 0 remains strict for CI).
Move InferenceMax loading onto substitute_config and a typed InferenceMaxVariantConfig with legacy adapters for InferenceMaxJob until the driver is ported.
…ase 2)

Flatten MI300X and MI355X variant configs to paths/model/container/roles/params/sweep and client.* threshold specs with enforce_thresholds false until recalibrated.
…Phase 2)

Use variant_config and legacy adapter fixtures, parametrization from sweep.runs, and unit tests for load_variant and threshold adapters.
…ion 1 (Phase 2)

Point loader and threshold docs at inferencemax_config_loader.load_variant and the client.* sweep cell format.
Point run-cvs-tests and dtni-dev-guide at cvs.lib.utils and inference/utils loaders.
Standalone driver uses Python-built vllm serve, vllm bench serve, and artifact parsing. Drop legacy InferenceBaseJob path and factory construction.
…_args (Phase 3)

MI300X and MI355X variants drop host-script and bench_serving params in favor of Python serve args.
… (Phase 3)

Add model_fetch, test_metric, and new InferenceMaxJob lifecycle. Update conftest and unit tests for typed config.
…ase 3)

Document Python serve, client.* metrics, and expanded lifecycle test stages.
Host script staging was dropped when InferenceMaxJob moved to Python-built vllm serve.
InferenceMax and vllm_single build vllm serve in Python; this package remains for InferenceBaseJob paths.
…(Phase 5)

Replace legacy config/benchmark_params table with typed blocks and client.* thresholds. Document inferencemax_config_loader in AGENTS.md.
Verify stock results artifact maps to client.* metrics via FakeOrch.
…ngle

Adopt InferenceX ATOM as the framework identity while the suite is still internal. Renames the driver, config loader, pytest suite, variant configs, and documentation to inferencex_atom_single.
Gate p99_tpot_ms instead of absent p95_tpot_ms, skip missing tier metrics in actuals, and recalibrate MI300X perf thresholds from the 2026-06-25 lab run.
Replace per-node calibrated gates with conservative throughput floors and loose latency caps so healthy runs pass across lab nodes without recalibration.

@atnair-amd atnair-amd left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated review from five-pass analysis (structure · duplication · unit tests · code quality · live validation run).

Blockers (must fix before merge): false-green on missing config_file · parse_results error paths untested · _client_log_failures untested
Majors: driver default guard · config/threshold structure diverges from vllm · server reuse helpers untested · build_server_cmd suppression untested · _merged_serve_args untested · tier-explosion untested · reuse_server_across_sweep default · no ATOM early-failure detection · wheel not in shared venv
Minors/NITs: see inline comments

No changes are requested without reviewer approval — comments only.

Comment thread cvs/lib/inference/utils/inferencex_atom_config_loader.py
Comment thread cvs/tests/inference/inferencex_atom/conftest.py
Comment thread cvs/tests/inference/inferencex_atom/inferencex_atom_single.py Outdated
Comment thread cvs/tests/inference/inferencex_atom/conftest.py
Comment thread cvs/lib/inference/inferencex_atom_orch.py
Comment thread cvs/tests/inference/inferencex_atom/inferencex_atom_single.py Outdated
Comment thread cvs/tests/inference/inferencex_atom/inferencex_atom_single.py
Comment thread cvs/tests/inference/inferencex_atom/inferencex_atom_single.py Outdated
Comment thread cvs/lib/inference/inferencex_atom_orch.py Outdated
Comment thread cvs/lib/inference/utils/vllm_benchmark_scripts/__init__.py
Comment thread cvs/lib/inference/utils/inferencex_atom_parsing.py
Comment thread cvs/lib/inference/inference_suite_lifecycle.py Outdated
Comment thread cvs/lib/inference/inference_suite_lifecycle.py Outdated
Comment thread cvs/lib/inference/utils/vllm_benchmark_scripts/vllm_serve_mi300x.sh

@amd-droy amd-droy left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me. thanks @hnimra-amd

@atnair-amd atnair-amd left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hnimra-amd hnimra-amd merged commit 85b6c72 into dev/dtni Jun 29, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants