Skip to content

[Parity with CUDA vLLM, CUDA SGLang]: native ATOM llm-d Kubernetes Distributed Inferencing Orchestrator for Production LLM Inferencing [Functional Enablement + Guides + Nightly CI] #1187

Description

@functionstackx

Suggestion Description

hi @chunfangamd @andyluo7

Majority of Production LLM serving is deploy through Kubernetes, where the system must route requests to the right replica, preserve KV-cache locality, scale up and down prefill/decode workers, handle multi-node MoE, etc. llm-d is the Kubernetes orchestrator that officially supported AMD project and a production deployment target for AMD vLLM customers for example Oracle and other customers What is the timeline for when ATOM will support k8s distributed inferencing orchestrator?

Here is the features we would like to see:

nightly upstream CI parity

Operating System

No response

GPU

No response

ROCm Component

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions