-
Notifications
You must be signed in to change notification settings - Fork 50
Open
Labels
kind/bugBug fixBug fix
Description
What happened?
Benchmark job pod can be scheduled on node that doesn't actually have the model due to lack of nodeAffinity
What did you expect to happen?
Benchmark job pod should have the same nodeAffinity as the inference service in its spec, so that it can run successfully
How can we reproduce it (as minimally and precisely as possible)?
Start inference service
apiVersion: ome.io/v1beta1
kind: InferenceService
metadata:
name: config-test
namespace: actions-runner-system
annotations:
sglang.deployed-by: "manual-test"
spec:
engine:
minReplicas: 1
maxReplicas: 1
model:
name: llama-4-scout-17b-16e-instructCreate benchmark job
apiVersion: ome.io/v1beta1
kind: BenchmarkJob
metadata:
name: benchmark-llama-4-scout-17b-16e-instruct
namespace: actions-runner-system
spec:
podOverride:
image: fra.ocir.io/idqj093njucb/xz-genai-bench:dev
huggingFaceSecretReference:
name: huggingface-secret
endpoint:
inferenceService:
name: llama-4-scout-17b-16e-instruct
namespace: actions-runner-system
task: text-to-text
maxTimePerIteration: 10
maxRequestsPerIteration: 1000
outputLocation:
storageUri: "oci://n/idqj093njucb/b/ome-benchmark-results/o/official-sgl/test/llama-4-scout-17b-16e-instruct"
parameters:
auth: "instance_principal"
region: "eu-frankfurt-1"Anything else we need to know?
Environment
- OME version: ord.ocir.io/idqj093njucb/ome-manager:v0.1.4-36-g0e8110c
- Kubernetes version (use
kubectl version): - Cloud provider or hardware configuration:
- OS (e.g., from
/etc/os-release): - Runtime (SGLang, vLLM, etc.) and version:
- Model being served (if applicable):
- Install method (Helm, kubectl, etc.):
Metadata
Metadata
Assignees
Labels
kind/bugBug fixBug fix