Skip to content

[BUG] OME seems to scale down Deployments to 1 replica temporarily whenever it changes something #397

@fzyzcjy

Description

@fzyzcjy

What happened?

After changing sth, e.g. a InferenceService/Runtime definition like how the probe should be like, etc, I see the auto-generated Deployment go through:

  • Before: e.g. Deployment spec replica = 100
  • After: Deployment spec replica = 1 for a while (this looks weird)
  • After a while: Deployment spec replica = 100 (this is normal)

I have not digged this problem yet, thus not very sure whether it is ome bug or not, but given my helm chart is nothing but simply change the InferenceService/runtime. If you cannot reproduce feel free to ping me and I can try to ablate into a min sample.

What did you expect to happen?

How can we reproduce it (as minimally and precisely as possible)?

# Example YAML configuration that reproduces the issue

Anything else we need to know?

Environment

  • OME version: 89f217e
  • Kubernetes version (use kubectl version):
  • Cloud provider or hardware configuration:
  • OS (e.g., from /etc/os-release):
  • Runtime (SGLang, vLLM, etc.) and version:
  • Model being served (if applicable):
  • Install method (Helm, kubectl, etc.):

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions