Skip to content

Add Amd AIE target#56

Draft
ElectrikSpace wants to merge 7 commits intoxtc-tools:mainfrom
ElectrikSpace:dev/snoiry/aie_target
Draft

Add Amd AIE target#56
ElectrikSpace wants to merge 7 commits intoxtc-tools:mainfrom
ElectrikSpace:dev/snoiry/aie_target

Conversation

@ElectrikSpace
Copy link
Contributor

@ElectrikSpace ElectrikSpace commented Feb 24, 2026

This PR contains a preliminary integration of the Amd AI Engine target in XTC.

The lowering relies on SDist, like the Mppa target, and Mlir-AIE but the integration of AIE is still in early phase, resulting in huge limitations for now (single core, no tiling, 1D tensors only, ...).

The XTC evaluation harness cannot be used for now, because the execution is performed through a wrapper provided with Mlir-AIE.

DO NOT MERGE FIRST
-> Only the last commit contains the feature, the others are dependencies from:

This patch allow a MlirTarget to provide its own way to apply
vectorization, to override the transform dialect based vectorization.
This goal of the work on the runtime is to facilitate the future
integration of non-host targets. New features are also added, especially
for accelerators.

Create common interfaces that runtimes will need to implement.
This patch introduces a runtime base named CommonRuntimeInterface which
is common among three derives classes of runtimes:
  - Host (no derived class for now)
  - AcceleratorDevice for accelerators. Instance of this class are
    called devices.
  - EmbeddedDevice for external embedded processors.

Instances of AcceleratorDevice and EmbeddedDevice are called devices.
Unlike the current lazy runtime resolution, the concept of devices will
allow to handle multiples accelerator of the same class.
Apply the new runtime interfaces on the two existing runtimes:
 - Create HostRuntime singleton class that derives from
   CommonRuntimeInterface
 - Create GPUDevice singleton (for now) class that derives from
   AcceleratorDevice to implement the GPU runtime.
Some method implementations are shared accross the two, but adding a
specific implemention for a particular runtime is now easier.

The GPU target has been completely split from the Host target.
To prevent code duplication due to a clear split between Host and GPU
runtimes, common code portions have been factorized in utils.

With this rework of the runtimes, the call path has been simplified,
confusing classes like Executor/Evaluator and functions like
load_and_evaluate have been removed.
In the context on computation offloaded on a accelerator device, the
user can specify where input/output tensors live when the evaluation
begins. This allows to simulate weights tensors to be transfered
ahead of time. This feature is only supported for the MLIR backend,
but setting a "memref.on_device" attribute.
Create a new Mppa compilation target for MLIR. This target is the
first one to implement the ahead of time offloading of tensors on device.

Create a new Mppa runtime derived from AcceleratorDevice. The runtime
can be configured on various aspect (check MppaConfig class). Execution
is supported on ISS, Qemu, and Hardware.

Note: In order to use the Mppa target, the Kalray Core Toolchain must be
installed. mlir_sdist and mlir_mppa must also be installed.
Rely on the kvxuks-catch pass to catch micro-kernels in replacement of
the transform dialect based vectorization.
Add Amd AI Engine NPU target:
- Compiler target for the Mlir backend
- Runtime target (does not have an evalutation harness for now)
@ElectrikSpace ElectrikSpace marked this pull request as draft February 24, 2026 10:31
@ElectrikSpace ElectrikSpace self-assigned this Feb 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant