Skip to content

[BUG] Inference Pod can't be scheduled #319

@strebeld

Description

@strebeld

What happened?

The iInference pod in pending state due to 2 node(s) didn't match Pod's node affinity/selector. The pods node selector is "models.ome.io/clusterbasemodel.llama-3-2-1b=Ready" and the 2 GPU nodes have a label of "models.ome.io/clusterbasemodel.llama-3-2-1b=Ready".

What did you expect to happen?

I would expect that the pod could be scheduled on the GPU nodes as it's node selector matches the node label.

How can we reproduce it (as minimally and precisely as possible)?

apiVersion: ome.io/v1beta1
kind: ClusterBaseModel
metadata:
  name: llama-3-2-1b
spec:
  vendor: meta
  version: "3.2"
  disabled: false
  modelType: llama
  modelArchitecture: LlamaForCausalLM
  modelParameterSize: "1B"
  maxTokens: 8192
  modelCapabilities:
    - text-to-text
  modelFormat:
    name: safetensors
    version: "1.0.0"
  modelFramework:
    name: transformers
    version: "4.43.0"
  storage:
    storageUri: "hf://meta-llama/Llama-3.2-1B-Instruct"
    path: "/models/llama-3.2-1b"
    key: "hf-token"
    parameters:
      secretKey: token
    nodeSelector:
      node.kubernetes.io/instance-type: g6.xlarge```

And the Below is the Inference YAML

```yaml
apiVersion: v1
kind: Namespace
metadata:
  name: llama-1b-demo-2
---
apiVersion: ome.io/v1beta1
kind: InferenceService
metadata:
  name: llama-3-2-1b
  namespace: llama-1b-demo-2
spec:
  predictor:
    model:
      baseModel: llama-3-2-1b
      protocolVersion: openAI
    minReplicas: 1
    maxReplicas: 1```


## Anything else we need to know?

<!-- Any additional context about the problem. -->

## Environment

- OME version: 0.1.3
- Kubernetes version (use `kubectl version`):
- Cloud provider or hardware configuration: GPU Nvidia L4
- OS (e.g., from `/etc/os-release`): Ubuntu 22.04
- Runtime (SGLang, vLLM, etc.) and version: SGlang
- Model being served (if applicable): llama 3.2 1B
- Install method (Helm, kubectl, etc.): Helm

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions