--override-tensor exps=CPU causes a performance regression on Vulkan (AMD) — opposite of reported CUDA behavior #24846
Unanswered
aivisionslab-studios
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Testing MoE models (Qwen3.5-35B-A3B, Q4_K_M/Q6_K) on an RX 580 8GB (Polaris/gfx803) via the Vulkan backend, Windows and Linux (Mesa RADV) both tested.
--override-tensor exps=CPUis documented as a CUDA optimization — keeping MoE expert tensors off the GPU to save VRAM/bandwidth on Nvidia setups. On Vulkan here it does the opposite: consistent regression in both environments.exps=CPUSetup: Xeon E5-2690 v3, 32GB DDR4 ECC, llama.cpp built with
-DGGML_VULKAN=ON, no ROCm/HIP anywhere in the stack.My read: on Vulkan, redirecting expert tensors back and forth seems to add PCIe/transfer overhead that CUDA's memory model doesn't have the same way — but I don't have visibility into why CUDA benefits while Vulkan doesn't. Anyone with more backend-internals knowledge know if this is expected, or worth filing as an actual issue?
Beta Was this translation helpful? Give feedback.
All reactions