Intel Hardware Support; oneAPI/Level Zero/Intel NEO #24873

daedaevibin · 2026-06-21T13:18:43Z

daedaevibin
Jun 21, 2026

I am looking to contribute C++ optimization work for Intel compute hardware here and want to gauge where the community's current pain points are.

I see two distinct paths forward and would be happy to tackle either (or both):

Native Level Zero (ze_api) Backend: Bypassing the SYCL runtime entirely by implementing a direct ggml-levelzero backend targeting the Intel NEO runtime. This would eliminate the multi-gigabyte oneAPI toolchain requirement for end-users, drastically lower driver-overhead latency, and bring deployment in line with the minimal, dependency-free philosophy of llama.cpp.
SYCL/Level Zero Bridge Optimization: Deepening optimizations inside the current ggml-sycl.cpp host-side layer. Specifically looking at reducing latency bottlenecks during the prompt-processing to token-generation handoff, and refining Unified Shared Memory (USM) allocation and device synchronization barriers over the underlying Level Zero driver.

Are there structural roadblocks that have kept a raw Level Zero backend from being pursued yet, or is a highly optimized SYCL layer the preferred trajectory for the core team? I have the dev environment ready to benchmark and trace execution directly.

Related: #23313 & #5277

Current Testing Environment

Here is the local hardware setup I am utilizing for initial benchmarking, optimizations, and implementation testing:

Component	Details / Specification
Host / Device	HP ProBook 450 G9 Laptop (daedev)
OS / Distro	CachyOS (x86_64)
Kernel	`7.0.12-1-cachyos-eevdf-lto`
Root Filesystem	XFS (`/dev/nvme1n1p2`)
CPU	12th Gen Intel Core i3-1215U (6 Cores / 8 Threads)
Graphics	Intel Alder Lake-UP3 GT1 [UHD Graphics] (ADL GT2)
Kernel Driver	`xe` driver (Mesa 26.1.2-arch3.1)
Compute APIs	Vulkan 1.4.350, OpenGL 4.6, EGL 1.5
Memory	16 GiB Total (System) + 15.25 GiB zRAM Swap

🧪 Call for Community Testers & Contributors

While this integrated setup is perfect for debugging low-level command queues, memory abstractions, and direct driver execution paths over the modern xe stack, testing across a wider matrix of Intel hardware is crucial.

If you have access to any of the following hardware and are willing to help test, benchmark, or collaborate on PR iterations, please reply to this thread:

Intel Arc Discrete GPUs (A-Series, B-Series/Battlemage)
Intel Core Ultra / Integrated Arc Graphics (Meteor Lake, Lunar Lake, Arrow Lake)
Intel Data Center GPUs (Flex / Max Series)
Systems running the legacy i915 kernel driver for older generation runtimes.

Any data points on latency, kernel compilation overhead, or throughput changes will be highly valuable once the code adjustments or backend prototypes go live!

arthw · 2026-06-22T02:36:27Z

arthw
Jun 22, 2026
Collaborator

@daedaevibin
It's good to know such ideas.

I suggest using level-zero API in SYCL backend:

build a new backend of level-zero API will take more time to reach the same quality and functionality of SYCL backend.
Pure level-zero API need to resolve some known issues which are fixed by SYCL already. It's duplicated workload.
SYCL backend has supported calling level-zero API directly in code. It's easy to replace the existed code of SYCL by level-zero API for better performance. It's an important path to optimize the performance by level-zero API we will do.
SYCL backend can copy any good code based on level-zero API, but level-zero code can't call SYCL code.

Layer
SYCL
Level zero
Driver
Hardware

llama.cpp prefer to support more users.
It encourages to support old OS/running time, instead of only new version.
Ubuntu 26.04 and kernel 7.0.x are too new to common users.
It's better support from Ubuntu 22.04 to newer.

Thank you!

2 replies

daedaevibin Jun 22, 2026
Author

That makes perfect sense. Avoiding a duplicated implementation lifecycle and maximizing backward compatibility (ensuring solid support down to Ubuntu 22.04) are definitely the right priorities for upstream alignment.

My primary motivation is to bridge the massive gap in Intel support for consumer/budget hardware in the open-source AI ecosystem, making local inference accessible without requiring enterprise-grade hardware. Focusing on Path 2 satisfies that goal while keeping things maintainable.

I will pivot to deep-diving into ggml-sycl.cpp. I'll focus on profiling the prompt-processing to token-generation handoff and injecting direct Level Zero API calls inside the SYCL layer where we can cut down USM allocation overhead and driver-side synchronization barriers.

Before I spin up the profiler and begin tracing execution paths, are there specific known bottlenecks or fragile synchronization zones in the current SYCL driver abstraction that the core team has already flagged for attention?

Refactoring & Maintainability Note

While tracking down these latency bottlenecks, I also plan to keep an eye on long-term code health. I lean heavily toward structured optimizations that make the codebase cleaner and easier to maintain.

If I encounter fragmented utility logic, my approach will be to group commonly reused libraries and components into centralized, shared files. This should improve readability and ensure all parts of the backend have clean, uniform access to core abstractions without duplicating boilerplate.

arthw Jun 22, 2026
Collaborator

Got it, it's great!

In current SYCL backend, you could use environment variables: GGML_SYCL_SUPPORT_LEVEL_ZERO_API and GGML_SYCL_USE_LEVEL_ZERO_API in building and running time to use level zero API to replace SYCL API. It helps to switch between them for debug too.

We don't use USM in fact. There is a PR about USM usage to be merged as experiential feature only.

I think the SYCL graph is not good, maybe level zero graph can replace it to increase the kernel launch speed.
There is blocking issue about supporting "update" operation in the graph.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Intel Hardware Support; oneAPI/Level Zero/Intel NEO #24873

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Intel Hardware Support; oneAPI/Level Zero/Intel NEO #24873

Uh oh!

Uh oh!

daedaevibin Jun 21, 2026

Current Testing Environment

🧪 Call for Community Testers & Contributors

Replies: 1 comment · 2 replies

Uh oh!

arthw Jun 22, 2026 Collaborator

Uh oh!

Uh oh!

daedaevibin Jun 22, 2026 Author

Refactoring & Maintainability Note

Uh oh!

arthw Jun 22, 2026 Collaborator

daedaevibin
Jun 21, 2026

Replies: 1 comment 2 replies

arthw
Jun 22, 2026
Collaborator

daedaevibin Jun 22, 2026
Author

arthw Jun 22, 2026
Collaborator