English|简体中文
RoleBasedGroup (RBG) is a Kubernetes API for orchestrating distributed, stateful AI inference workloads with multi-role collaboration and built-in service discovery.
It provides a common deployment pattern for production LLM inference, especially disaggregated architectures such as prefill/decode separation.
[2025-09-23] RBG v0.4.0 is released. Please check out the release notes for more details.
[2025-07-21] RBG v0.3.0 is released. Please check out the release notes for more details.
Traditional Kubernetes primitives (e.g. plain StatefulSets / Deployments) are ill-suited for LLM inference services that:
- run as multi-role topologies (gateway / router / prefill / decode),
- are performance-sensitive to GPU / network topology,
- and require atomic, cross-role operations (deploy, upgrade, scale, failover).
RBG treats an inference service as a role-based group, not a loose set of workloads. It models the service as a topologized, stateful, coordinated multi-role organism and manages it as a single unit.
-
Role
The basic scheduling and rollout unit. Each role (e.g. prefill, decode) has its own spec, lifecycle and policies. -
RoleBasedGroup
A group of roles that together form one logical service (e.g. one LLM inference deployment).
RBG treats "Role" as the atomic unit for scheduling orchestration, while establishing configurable relationships between different roles. It views a single inference service as a topological, stateful, and collaborative "Role Organism," rather than an isolated collection of Deployments.
Based on this philosophy, RBG has built the five core capabilities of SCOPE:
- Topology-aware deterministic operations with unique RoleID injection and minimal replacement domain principles.
- Cross-role policy engine supporting deployment pairing, coordinated upgrades, linked recovery, and coordinated scaling.
- Defines role dependencies and precise startup sequences within a RoleBasedGroup.
- Topology self-aware service discovery - injects complete role topology into Pods, eliminating external service dependencies.
Topology-aware placement with hardware affinity (GPU-NVLink > PCIe > RDMA > VPC) and role affinity scheduling.
Future-proof deployment abstraction using declarative APIs and plugin mechanisms to adapt new architectures in weeks.
You can see our documentation at docs for more in-depth installation and instructions for production.
| RBG Version | Kubernetes Version | LeaderWorkerSet Version |
|---|---|---|
| main | >=v1.28.x | >=v0.7.0 |
| v0.4.0 | >=v1.28.x | >=v0.7.0 |
| v0.3.0 | >=v1.28.x | >=v0.6.0 |
We welcome contributions through issues and PRs! See CONTRIBUTING.md.
Learn how to engage with the Kubernetes community on the community page.
You can reach the maintainers of this project at:
Participation in the Kubernetes community is governed by the Kubernetes Code of Conduct.
We learned the design and reused code from the following projects: lws
