Hnimrama/ix atom#238
Conversation
Stacked PR body covering motivation, technical details, test plan, lab results, and checklist targeting dev/dtni.
d62122b to
e17ab7f
Compare
Probe python3.13..python3 for import vllm; export BENCH_PY and BENCH_SCRIPT. Use shlex.quote for docker exec bash -c. Align InferenceMax client completion with Serving Benchmark Result or End-to-end Latency.
Search site-packages and ancestor paths, verify the file is readable, and document vllm[bench] when wheels omit benchmarks/.
Use CVS_GPU_MEMORY_UTIL in sample config and serve script to avoid vLLM unknown-env warnings. Extend default readiness poll budget to 60 and grep full server logs so Uvicorn ready is not missed after long model loads.
Wheels often omit vllm/benchmarks; resolve the driver via eval exports, run python -m vllm.entrypoints.cli.main bench serve when needed, and fail fast on missing-script log patterns in InferenceMax and base polling.
vLLM random workloads scale (ISL+OSL)*(1+r); clamp ratio when it would exceed MML, pass --temperature 0 for greedy parity, and forward --metric-percentiles in InferenceMax and vllm_single clients.
Read client_poll_count and client_poll_wait_time from benchmark_params (defaults 50/60), document them and fix the inferencemax.rst table, and surface the keys in sample MI300X/MI355X configs.
…st polling Gate benchmark success on Failed requests only after the summary is present; tail more client log lines for InferenceMax. Variant and benchmark_params accept bench_max_failed_requests (default 0 remains strict for CI).
Move InferenceMax loading onto substitute_config and a typed InferenceMaxVariantConfig with legacy adapters for InferenceMaxJob until the driver is ported.
…ase 2) Flatten MI300X and MI355X variant configs to paths/model/container/roles/params/sweep and client.* threshold specs with enforce_thresholds false until recalibrated.
…Phase 2) Use variant_config and legacy adapter fixtures, parametrization from sweep.runs, and unit tests for load_variant and threshold adapters.
…ion 1 (Phase 2) Point loader and threshold docs at inferencemax_config_loader.load_variant and the client.* sweep cell format.
Point run-cvs-tests and dtni-dev-guide at cvs.lib.utils and inference/utils loaders.
Standalone driver uses Python-built vllm serve, vllm bench serve, and artifact parsing. Drop legacy InferenceBaseJob path and factory construction.
…_args (Phase 3) MI300X and MI355X variants drop host-script and bench_serving params in favor of Python serve args.
… (Phase 3) Add model_fetch, test_metric, and new InferenceMaxJob lifecycle. Update conftest and unit tests for typed config.
…ase 3) Document Python serve, client.* metrics, and expanded lifecycle test stages.
Host script staging was dropped when InferenceMaxJob moved to Python-built vllm serve.
InferenceMax and vllm_single build vllm serve in Python; this package remains for InferenceBaseJob paths.
…(Phase 5) Replace legacy config/benchmark_params table with typed blocks and client.* thresholds. Document inferencemax_config_loader in AGENTS.md.
Verify stock results artifact maps to client.* metrics via FakeOrch.
…ngle Adopt InferenceX ATOM as the framework identity while the suite is still internal. Renames the driver, config loader, pytest suite, variant configs, and documentation to inferencex_atom_single.
Document per-variant ~/input subdirs to avoid ambiguous threshold discovery, remote launcher vs GPU node prerequisites, and ~/cvs_results output paths.
Elevate scaling to P1 milestone M5 immediately after M4 parity when hardware and suite recipes support nnodes>1; defer MTP+P2 widen to M6.
544f8ad to
73dfb65
Compare
Gate p99_tpot_ms instead of absent p95_tpot_ms, skip missing tier metrics in actuals, and recalibrate MI300X perf thresholds from the 2026-06-25 lab run.
Replace per-node calibrated gates with conservative throughput floors and loose latency caps so healthy runs pass across lab nodes without recalibration.
atnair-amd
left a comment
There was a problem hiding this comment.
Automated review from five-pass analysis (structure · duplication · unit tests · code quality · live validation run).
Blockers (must fix before merge): false-green on missing config_file · parse_results error paths untested · _client_log_failures untested
Majors: driver default guard · config/threshold structure diverges from vllm · server reuse helpers untested · build_server_cmd suppression untested · _merged_serve_args untested · tier-explosion untested · reuse_server_across_sweep default · no ATOM early-failure detection · wheel not in shared venv
Minors/NITs: see inline comments
No changes are requested without reviewer approval — comments only.
amd-droy
left a comment
There was a problem hiding this comment.
looks good to me. thanks @hnimra-amd
PR #238 — InferenceX ATOM W1 (MI300X perf gates)
PR: #238
Head:
hnimrama/IX-atom→ Base:dev/dtniJira: AIMVT-236 · AIMVT-244
Motivation
CVS needs a first-class InferenceX ATOM automation path aligned with the DTNI Validation Tracker (IX ATOM) — not the legacy
inferencemax_singleuplift. This PR:inferencex_atom_singlesuite with an ATOM-native driver (atom.entrypoints.openai_server+atom.benchmarks.benchmark_serving).enforce_thresholds: trueafter lab confirmation.plans/inferencex-atom-cvs-automation-plan.md.MI355X variant configs ship with
enforce_thresholds: falseuntil hardware is available — they do not block merge or MI300X milestone work.Base branch:
dev/dtni.Technical Details
Suite and orchestration
inferencemax_single→inferencex_atom_single; legacyinferencemax/configs removed.InferenceXAtomJob(inferencex_atom_orch.py):params.driver=atom→ ATOM server +benchmark_serving; JSON artifacts parsed viato_client_metrics.params.driver=vllmretained for interim GPT-OSS uplift variants only.cvs/input/config_file/inference/inferencex_atom_single/:{gpu}_inferencex-atom-single_{model}_{precision}[_{mode}]_config.json{gpu}_inferencex-atom-single_{model}_{precision}[_{mode}]_threshold.jsonschema_version: 1, typed loader, andix_recipes.jsonrecipe pins (dsr1-fp8-mi300x-atom, etc.).mi300x_atom_single.json,mi355x_atom_single.json; container names pinned (inferencex_atom_mi300x/inferencex_atom_mi355x).inference_suite_lifecycle.py,inference_suite_results_table.py.W1 variants shipped
enforce_thresholdsmi300x_inferencex-atom-single_deepseek-r1_fp8_perftruemi300x_inferencex-atom-single_deepseek-r1_fp8_smokefalsemi300x_inferencex-atom-single_deepseek-r1_fp8_mtp3falsemi355x_inferencex-atom-single_deepseek-r1_fp8_perffalsemi355x_inferencex-atom-single_deepseek-r1_fp8_mtp3falseThreshold / metrics plumbing
test_cell_metrics(METRIC_TIERS: throughput, ttft, tpot, health, record) — one pytest row per tier per sweep cell.to_client_metrics: deriveclient.failedwhen ATOM omits it (num_prompts - completed), then computeclient.success_rate; addclient.output_tput_per_gpu.p99_tpot_ms(ATOMbenchmark_servingemits p99 tails;p95_tpot_msmay be absent even withmetric_percentiles: "95,99"). Tier enforcement skips metrics missing from the artifact.Platform / shared infra (supporting changes)
cvs/lib/utils/— sharedconfig_loader,verdict, sweep selector.vllm_singlesuite refactor to the same lifecycle / metric pattern.cvs/input/config_file/inference/inferencex_atom_single/README.md.Out of scope (follow-up on
dev/dtni)inferencex_atom_vllm_single/inferencex_atom_sglang_singleparity frameworksTest Plan
CI / unit (no GPU)
pytest cvs/tests/inference/inferencex_atom/ -qpytest cvs/lib/inference/unittests/test_inferencex_atom_parsing.py -qpytest cvs/lib/inference/unittests/test_inferencex_atom_config_loader.py -qLab — MI300X smoke
Launcher:
CTR-SVDT-L005(10.7.54.167) · GPU node:10.245.135.75Lab — MI300X W1 perf (M1 gate)
CONC=128,CONC=256(ISL=OSL=1024, TP=8); server reused across cells.enforce_thresholds: true.MI355X
enforce_thresholdswhen hardware is available.Test Result
MI300X lab —
mi300x_inferencex-atom-single_deepseek-r1_fp8_perfhnimrama/IX-atom@07c90a7CTR-SVDT-L005(10.7.54.167) · GPU node:10.245.135.75make installfrom repo root (.cvs_venv)rocm/atom-dev:latestdeepseek-ai/DeepSeek-R1-0528(FP8, TP=8)client.output_throughput(measured)MI300X lab — smoke
Artifacts (attach to PR)
inferencex_atom_single_2026-06-25T175704.zip
Unit tests
Submission Checklist
dev/dtnimake installfrom this branch (not a stale site-packages install)enforce_thresholds: trueonly on MI300X W1 perf — confirmed in labenforce_thresholds: false(pending hardware)dev/dtni