Creating Device Context for Non-NVIDIA/Metal devices #1783

jbrockett-reach · 2025-07-08T17:06:07Z

jbrockett-reach
Jul 8, 2025

Hi there! I started using tract for an embedded project and it's been working out well-- however, I’m using an embedded device with a GPU, and would like to be able to have my model’s inferencing done on that to free up the CPU. That said, my GPU is not Metal or NVIDIA-based, so the Metal and Cuda crates wouldn’t work out for me. My current model is stored in tflite format (though I could change this), if it's relevant. Though there's the GPU crate, there's currently no example and I can't quite gather how to set it up.

Currently, my error is that I don't have a device context to set to-- how might I go about creating this?

I have OpenCL working on the device, so my current best guess is to try and mimic what's going on in the Cuda and Metal crates with OpenCL, but I'm not familiar enough with the process to know if there's a better approach.

kali · 2025-07-08T18:27:56Z

kali
Jul 8, 2025
Maintainer

Hello ! thanks for your interest in tract.

tract-gpu is a support crate. We did metal first, then moved on to cuda and started by extracting from metal whatever we thought would also make sense for cuda (or other gpus). So... absolutely, if you want to start working on some OpenCL support, your starting point should be extending/implementing the tract-gpu traits similarly to tract-metal and tract-cuda.

You will first need to figure out what does what in OpenCL. The "Context" is a somewhat process-wide singleton that you can use to allocate memory, while the Metal queue and cuda Stream allow to schedule operations.

You will need to find some bindings for Opencl. We probably want to stay away from overly ambitious project that actually allow authoring kernels in rust (like rust-gpu). A driver-level binding that will allow be focused on the Context or its equivalent and the Stream/CommandQueue is what you're after. Hopefully there is something not too heavy and maintained that could be used. For cuda we went with cudarc: its ability to dynamic load cuda was great, because we can have a universal binary command line that will work with cuda if cuda is there, but will gracefully handle a server where cuda is missing.

If you can figure out these dependencies, then I guess we will be able to guide you for the next steps. We have been through that relatively recently, so it's still fresh in our memories :)

1 reply

jbrockett-reach Jul 8, 2025
Author

Thanks for your help! Thus far, I've been the using crate opencl3, which I've verified works on my device. It offers access to CommandQueue and openCL Context, so hopefully it will work.

kali · 2025-07-09T15:10:39Z

kali
Jul 9, 2025
Maintainer

Pinging @LouisChourakiSonos for more insights on the gpu stuff

0 replies

LouisChourakiSonos · 2025-07-10T09:56:24Z

LouisChourakiSonos
Jul 10, 2025
Collaborator

Hey @jbrockett-reach!

I am the person who has worked on tract-metal and is now working on tract-cuda. Very enthusiastic about a having a potential tract-opencl crate! Are you trying to run some LLM model or more classic NNs?

I think mimicking Cuda and Metal crates is the good approach. GPU work for tract is pretty recent so it will help a lot to have implementations as uniform as possible in case we need to do general refactoring. Hopefully tract-gpu is generic enough to let you do what you want with open-cl but if it is not the case, we can discuss of possible modifications.

The process I suggest you to follow is the following:

Start implementing Context, DeviceBuffer, DeviceTensor,.. objects
In a kernels sub-folder, add the kernel code for the operator you want to implement
Implement the CPU side code that will launch your kernels and write some small tests
Implement the operator code that will call the kernel launcher
Add the operator to your OpenCLTransform code so you can check with the CLI (after you add necessary option for enabling opencl code) that you are correctly converting the model
Back to step 2 for another operator

I suggest you start with a simple operator like element-wise operator because kernels are simple to write and you can quickly iterate on them. Actually do you plan on writing kernels yourself or taking them from some open source repository? On our side, we took inspiration from llama.cpp and apple-mlx code if it helps..

Another useful thing to know is about the test-rt crate, which contains some test suites and runtimes to run them. Suites create some simple models with basically 1 operator or 2, and runtimes do some transform on them and check inference result. Since we recently added more operators to suite-unit, I strongly suggest you also create a test-opencl runtime in test-rt so you can step by step check that your operators also work at a graph level.

I don't really know much about opencl, so I hope the crate you have found will provide what you need. I am a bit worried about the fact the project is not very active, but if you don't have the choice then go for it!

I hope I answered most of your questions. Don't hesitate to ask them here if you have others! We are very grateful for you interest in tract and we look forward on running some models in opencl

1 reply

jbrockett-reach Jul 10, 2025
Author

I'm angling to run some more classic NNs.

Currently I'm somewhere in step 1-2 of the suggested process, so we'll see how it progresses from there.

In terms of the operators, while I'd prefer to take them from a preexisting repository, my light searches haven't taken me to one yet. Luckily, opencl reads in C-based sources, which (at least some of) the Metal operators can be translated to without much difficulty.

Thanks for the tips about the tests! Regarding the opencl crate, while it might not be super active, it has been updated more recently than the other primary opencl crate, so it's just a risk I guess I'll have to take. Between the crate docs and the OpenCL proper docs, I've been able to get a grasp on what I'm supposed to do.

Thanks again!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Creating Device Context for Non-NVIDIA/Metal devices #1783

Uh oh!

{{title}}

Uh oh!

Replies: 3 comments 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Creating Device Context for Non-NVIDIA/Metal devices #1783

Uh oh!

jbrockett-reach Jul 8, 2025

Replies: 3 comments · 2 replies

Uh oh!

kali Jul 8, 2025 Maintainer

Uh oh!

jbrockett-reach Jul 8, 2025 Author

Uh oh!

kali Jul 9, 2025 Maintainer

Uh oh!

LouisChourakiSonos Jul 10, 2025 Collaborator

Uh oh!

jbrockett-reach Jul 10, 2025 Author

jbrockett-reach
Jul 8, 2025

Replies: 3 comments 2 replies

kali
Jul 8, 2025
Maintainer

jbrockett-reach Jul 8, 2025
Author

kali
Jul 9, 2025
Maintainer

LouisChourakiSonos
Jul 10, 2025
Collaborator

jbrockett-reach Jul 10, 2025
Author