Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions docs/proto/provider.md
Original file line number Diff line number Diff line change
Expand Up @@ -299,6 +299,7 @@
| `ephemeral_storage` | [ResourcePair](#akash.inventory.v1.ResourcePair) | | |
| `volumes_attached` | [ResourcePair](#akash.inventory.v1.ResourcePair) | | |
| `volumes_mounted` | [ResourcePair](#akash.inventory.v1.ResourcePair) | | |
| `gpu_interconnect` | [ResourcePair](#akash.inventory.v1.ResourcePair) | | GPUInterconnect reports node GPU-interconnect HCA capacity. Capacity/Allocatable/Allocated are populated by the inventory operator from k8s allocatable for whichever rdma/rdma_shared_device_* extended resource the cluster's device plugin publishes (the `rdma/*` prefix is the device plugin's naming convention; see NodeCapabilities.interconnect_resource_name). |



Expand Down Expand Up @@ -347,6 +348,9 @@
| Field | Type | Label | Description |
| ----- | ---- | ----- | ----------- |
| `storage_classes` | [string](#string) | repeated | |
| `interconnect_resource_name` | [string](#string) | | Kubernetes extended-resource name the cluster's device plugin publishes for GPU interconnect HCAs (e.g. rdma/rdma_shared_device_ib for an InfiniBand fabric, rdma/rdma_shared_device_eth for RoCE). The `rdma/*` prefix is the device-plugin's own convention (Mellanox/NVIDIA) and stays unchanged here. Empty when the node has no GPU interconnect capability. Discovered by the inventory operator from k8s allocatable. |
| `interconnect_fabric` | [string](#string) | | GPU interconnect fabric type. "infiniband" or "roce". Internal / informational — the SDL surface is fabric-agnostic; tenants only declare `interconnect: []` or `interconnect: { group: <name> }`. Derived from /sys/class/infiniband/<dev>/ports/1/link_layer on the host node. |
| `nccl_hca_prefixes` | [string](#string) | repeated | NCCL HCA device-name prefixes present on this node, one per distinct family (e.g. ["mlx5"], or ["mlx5","bnxt_re"] on a mixed-vendor host). Same key for IB and RoCE since NCCL uses the IB verbs API for both. Joined with commas and injected as NCCL_IB_HCA when scheduling GPU interconnect workloads — NCCL accepts comma-separated device prefixes natively. Discovered from /sys/class/infiniband/<dev> on the host. |



Expand Down Expand Up @@ -598,6 +602,9 @@
| `expose` | [ServiceExpose](#akash.manifest.v2beta3.ServiceExpose) | repeated | |
| `params` | [ServiceParams](#akash.manifest.v2beta3.ServiceParams) | | |
| `credentials` | [ImageCredentials](#akash.manifest.v2beta3.ImageCredentials) | | |
| `interconnect_group` | [string](#string) | | InterconnectGroup carries the SDL gpu.attributes.interconnect_group peer-group label. Lifted from Resources.GPU.Attributes by the manifest builder so the off-chain workload builder can label pods for per-group anti-affinity. Services sharing the same value form one NCCL peer group; the provider schedules them on distinct nodes. Empty when the service is not part of any GPU interconnect group.

JSON / YAML tags carry `omitempty`: the on-chain manifest `version` is a SHA hash of the JSON-serialized off-chain manifest, so any field that always serializes (even at zero value) would shift the hash for every non-interconnect SDL and break send-manifest validation on existing leases. |



Expand Down
Loading
Loading