What happened?
After changing sth, e.g. a InferenceService/Runtime definition like how the probe should be like, etc, I see the auto-generated Deployment go through:
- Before: e.g. Deployment spec replica = 100
- After: Deployment spec replica = 1 for a while (this looks weird)
- After a while: Deployment spec replica = 100 (this is normal)
I have not digged this problem yet, thus not very sure whether it is ome bug or not, but given my helm chart is nothing but simply change the InferenceService/runtime. If you cannot reproduce feel free to ping me and I can try to ablate into a min sample.
What did you expect to happen?
How can we reproduce it (as minimally and precisely as possible)?
# Example YAML configuration that reproduces the issue
Anything else we need to know?
Environment
- OME version: 89f217e
- Kubernetes version (use
kubectl version):
- Cloud provider or hardware configuration:
- OS (e.g., from
/etc/os-release):
- Runtime (SGLang, vLLM, etc.) and version:
- Model being served (if applicable):
- Install method (Helm, kubectl, etc.):