Skip to content

[Bug]: HunyuanOCR batching problem with variable sized images in a batch. #30342

@anker-c2

Description

@anker-c2

Your current environment

Running on the latest nightly docker vllm-openai image using the following paramters:

command: "--model tencent/HunyuanOCR --trust-remote-code --dtype bfloat16
        --max-model-len 6144 --limit-mm-per-prompt '{\"image\": 1}'
        --max-num-seqs 256 --max-num-batched-tokens 16384 --gpu-memory-utilization 0.9 --swap-space 32 --enforce-eager"

🐛 Describe the bug

When running the tencent/HunyuanOCR model under load, another problem in the batching logic mixes different images / inputs.

The problem appears to be inside the model executor code and seems to happen when differently sized pictures are present in the same batch.

The behavior has been produced and described in this Issue over at the model's repository:
Tencent-Hunyuan/HunyuanOCR#60 (comment)

I have reliably reproduced the issue in my pipeline, however was not able to generate a minimal example that i can share since my dataset that encountered the issue is confidential.

However, I have built a fix that seems to sidestep the issue by processing the images independently and will open a PR.

Additional Information:

  • Setting --max-num-seqs 1 to deactivate batching worked in my test.

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions