-
Notifications
You must be signed in to change notification settings - Fork 50
Open
Labels
kind/bugBug fixBug fix
Description
What happened?
The iInference pod in pending state due to 2 node(s) didn't match Pod's node affinity/selector. The pods node selector is "models.ome.io/clusterbasemodel.llama-3-2-1b=Ready" and the 2 GPU nodes have a label of "models.ome.io/clusterbasemodel.llama-3-2-1b=Ready".
What did you expect to happen?
I would expect that the pod could be scheduled on the GPU nodes as it's node selector matches the node label.
How can we reproduce it (as minimally and precisely as possible)?
apiVersion: ome.io/v1beta1
kind: ClusterBaseModel
metadata:
name: llama-3-2-1b
spec:
vendor: meta
version: "3.2"
disabled: false
modelType: llama
modelArchitecture: LlamaForCausalLM
modelParameterSize: "1B"
maxTokens: 8192
modelCapabilities:
- text-to-text
modelFormat:
name: safetensors
version: "1.0.0"
modelFramework:
name: transformers
version: "4.43.0"
storage:
storageUri: "hf://meta-llama/Llama-3.2-1B-Instruct"
path: "/models/llama-3.2-1b"
key: "hf-token"
parameters:
secretKey: token
nodeSelector:
node.kubernetes.io/instance-type: g6.xlarge```
And the Below is the Inference YAML
```yaml
apiVersion: v1
kind: Namespace
metadata:
name: llama-1b-demo-2
---
apiVersion: ome.io/v1beta1
kind: InferenceService
metadata:
name: llama-3-2-1b
namespace: llama-1b-demo-2
spec:
predictor:
model:
baseModel: llama-3-2-1b
protocolVersion: openAI
minReplicas: 1
maxReplicas: 1```
## Anything else we need to know?
<!-- Any additional context about the problem. -->
## Environment
- OME version: 0.1.3
- Kubernetes version (use `kubectl version`):
- Cloud provider or hardware configuration: GPU Nvidia L4
- OS (e.g., from `/etc/os-release`): Ubuntu 22.04
- Runtime (SGLang, vLLM, etc.) and version: SGlang
- Model being served (if applicable): llama 3.2 1B
- Install method (Helm, kubectl, etc.): HelmMetadata
Metadata
Assignees
Labels
kind/bugBug fixBug fix