-
-
Notifications
You must be signed in to change notification settings - Fork 11.9k
Description
Your current environment
Running on the latest nightly docker vllm-openai image using the following paramters:
command: "--model tencent/HunyuanOCR --trust-remote-code --dtype bfloat16
--max-model-len 6144 --limit-mm-per-prompt '{\"image\": 1}'
--max-num-seqs 256 --max-num-batched-tokens 16384 --gpu-memory-utilization 0.9 --swap-space 32 --enforce-eager"
🐛 Describe the bug
When running the tencent/HunyuanOCR model under load, another problem in the batching logic mixes different images / inputs.
The problem appears to be inside the model executor code and seems to happen when differently sized pictures are present in the same batch.
The behavior has been produced and described in this Issue over at the model's repository:
Tencent-Hunyuan/HunyuanOCR#60 (comment)
I have reliably reproduced the issue in my pipeline, however was not able to generate a minimal example that i can share since my dataset that encountered the issue is confidential.
However, I have built a fix that seems to sidestep the issue by processing the images independently and will open a PR.
Additional Information:
- Setting
--max-num-seqs 1to deactivate batching worked in my test.
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.