Feature Request: AMD GPU (ROCm) Backend Support
Summary
Add support for AMD GPUs via ROCm/HIP, similar to existing Metal backend for Apple Silicon.
Hardware Tested
- AMD Ryzen AI Max+ 395 (16-core/32-thread APU)
- AMD Radeon 8060S Graphics (integrated GPU, 64GB VRAM)
- Ubuntu 26.04 LTS
- ROCm 6.x installed and working (verified with rocminfo)
Current Situation
- ds4 compiles on Linux with `-DDS4_NO_METAL` flag
- CPU backend works but is extremely slow for 284B parameter model
- llama.cpp with HIP works on same hardware (182 t/s on small model)
Expected Behavior
- Run ds4 with `--backend cpu` (or new `--backend gpu`) on AMD GPUs
- Full model support (DeepSeek V4 Flash IQ2/Q4)
- KV cache persistence support
Implementation Suggestion
HIP/ROCm API is similar to Metal. The main components needed:
- `ds4_hip.h` / `ds4_hip.c` - HIP tensor operations (analogous to ds4_metal.h/m)
- Convert Metal shaders to HIP/compute kernels
- Add backend selection in ds4_engine_options
Additional Notes
- GPU detected as gfx1151, 64GB VRAM available
- ROCm stack is stable on this hardware
- Happy to test and provide feedback
Feature Request: AMD GPU (ROCm) Backend Support
Summary
Add support for AMD GPUs via ROCm/HIP, similar to existing Metal backend for Apple Silicon.
Hardware Tested
Current Situation
Expected Behavior
Implementation Suggestion
HIP/ROCm API is similar to Metal. The main components needed:
Additional Notes