Rework runtime and add Mppa target by ElectrikSpace · Pull Request #47 · xtc-tools/xtc

ElectrikSpace · 2026-02-12T17:30:30Z

This PR rework the runtime to facilitate the support of new targets, especially accelerators.

It also proposes an integration of the Kalray Mppa target.

ElectrikSpace · 2026-02-13T13:54:32Z

PR #49 is required to run the CI on MacOS.

ElectrikSpace · 2026-02-16T12:44:02Z

Update SDist requirements

This goal of the work on the runtime is to facilitate the future integration of non-host targets. New features are also added, especially for accelerators. Create common interfaces that runtimes will need to implement. This patch introduces a runtime base named CommonRuntimeInterface which is common among three derives classes of runtimes: - Host (no derived class for now) - AcceleratorDevice for accelerators. Instance of this class are called devices. - EmbeddedDevice for external embedded processors. Instances of AcceleratorDevice and EmbeddedDevice are called devices. Unlike the current lazy runtime resolution, the concept of devices will allow to handle multiples accelerator of the same class.

Apply the new runtime interfaces on the two existing runtimes: - Create HostRuntime singleton class that derives from CommonRuntimeInterface - Create GPUDevice singleton (for now) class that derives from AcceleratorDevice to implement the GPU runtime. Some method implementations are shared accross the two, but adding a specific implemention for a particular runtime is now easier. The GPU target has been completely split from the Host target. To prevent code duplication due to a clear split between Host and GPU runtimes, common code portions have been factorized in utils. With this rework of the runtimes, the call path has been simplified, confusing classes like Executor/Evaluator and functions like load_and_evaluate have been removed.

In the context on computation offloaded on a accelerator device, the user can specify where input/output tensors live when the evaluation begins. This allows to simulate weights tensors to be transfered ahead of time. This feature is only supported for the MLIR backend, but setting a "memref.on_device" attribute.

Create a new Mppa compilation target for MLIR. This target is the first one to implement the ahead of time offloading of tensors on device. Create a new Mppa runtime derived from AcceleratorDevice. The runtime can be configured on various aspect (check MppaConfig class). Execution is supported on ISS, Qemu, and Hardware. Note: In order to use the Mppa target, the Kalray Core Toolchain must be installed. mlir_sdist and mlir_mppa must also be installed.

Rely on the kvxuks-catch pass to catch micro-kernels in replacement of the transform dialect based vectorization.

guillon · 2026-03-02T18:32:01Z

@ElectrikSpace I do not understand the status of this review. The description seems to still depend on a review which was merged and specify to look only at the last commit. Though it seems that you rebased it already. But there are still 5 commits.
Can you rebase and update the description?

ElectrikSpace · 2026-03-03T09:46:13Z

I didn't update the description because the status of the dependent PR has been set to merged automatically, but I can remove it

guillon

Thanks for the proposal, it's very cool.

In addition to inline comments:

you may add to docs/develop/optional_backends.md in the sdist section possibly the way to test for mppa as we do not have automation there yet, I can verify if I can run it on our local machine.
also for the nvgpu, it seems that there are tests missing, can you add there also a section for this target and how to run it? I may try to run it on a grid 5000 machine with GPU.

guillon · 2026-03-03T17:24:49Z

src/xtc/itf/runtime/common.py

+        number: int,
+        min_repeat_ms: int,
+        cfunc: CFunc,
+        args_tuples: list[Any],


Please verify why arg_tuples there and args in evaluate differe

guillon · 2026-03-03T17:26:23Z

src/xtc/itf/runtime/embedded.py

+from xtc.itf.runtime.common import CommonRuntimeInterface
+
+
+class EmbeddedDevice(CommonRuntimeInterface, ABC):


What is this for?

guillon · 2026-03-03T17:26:32Z

src/xtc/itf/runtime/embedded.py

+        """
+        ...
+
+    # TODO


What is left TODO?

guillon · 2026-03-03T17:35:44Z

src/xtc/runtimes/host/__init__.py

 #
+from .HostRuntime import HostRuntime
+
+__all__ = ["HostRuntime"]


You do not Really need all there as the default is to expose all names except _*

guillon · 2026-03-03T17:42:39Z

src/xtc/targets/accelerator/gpu/GPUModule.py

+        self._payload_name = payload_name
+        self._file_name = file_name
+        self._file_type = file_type
+        assert self._file_type == "shlib", "only support shlib for JIR Module"


Replace JIR Module by GPU Module

guillon · 2026-03-03T17:46:11Z

src/xtc/utils/numpy.py



-def np_init(shape: tuple, dtype: str) -> numpy.typing.NDArray[Any]:
+def np_init(shape: tuple, dtype: str, **attrs: Any) -> numpy.typing.NDArray[Any]:


Why do you need extra **attrs here?

guillon · 2026-03-03T17:54:42Z

tests/filecheck/backends/target_mppa/test_matmul_mlir_offload_tensor.py

+mppa = MppaDevice()
+
+I, J, K, dtype = 4, 8, 16, "float32"
+a = O.tensor((I, K), dtype, name="A") # A live son the host


Typo: live son

guillon · 2026-03-03T17:54:59Z

tests/filecheck/backends/target_mppa/test_matmul_mlir_offload_tensor.py

+b = O.tensor((K, J), dtype, name="B", device=mppa) # B lives on the accelerator
+
+with O.graph(name="matmul") as gb:
+    O.matmul(a, b, name="C", device=mppa) # C msut lives on the accelerator


ypot: msut

guillon · 2026-03-03T17:59:28Z

tests/filecheck/backends/target_mppa/test_matmul_mlir_offload_tensor.py

+# CHECK-NEXT:  module attributes {transform.with_named_sequence} {
+# CHECK-NEXT:    sdist.processor_mesh @processor_mesh from @memory_mesh = <["px"=1, "py"=1, "psx"=2, "psy"=8]>
+# CHECK-NEXT:    sdist.memory_mesh @memory_mesh = <["mx"=1, "my"=1]>
+# CHECK-NEXT:    func.func @matmul(%arg0: memref<4x16xf32> {llvm.noalias}, %arg1: memref<16x8xf32> {llvm.noalias, memref.on_device}, %arg2: memref<4x8xf32> {llvm.noalias, memref.on_device}) {


Is the buffer A transfered at some point to the accelerator, or just read directly from main memory?

guillon · 2026-03-03T18:00:40Z

Makefile

 	[ `uname -s` = Darwin ] || env XTC_MLIR_TARGET=nvgpu lit -v tests/filecheck/backends tests/filecheck/mlir_loop tests/filecheck/evaluation

+check-lit-mppa:
+	env XTC_MLIR_TARGET=mppa lit -v -j 1 tests/filecheck/backends/target_mppa


I guess you should exclude darwin for this target, see above

ElectrikSpace force-pushed the dev/snoiry/mppa_target branch 2 times, most recently from 8718fbd to bfb8226 Compare February 13, 2026 13:33

ElectrikSpace marked this pull request as ready for review February 13, 2026 13:34

ElectrikSpace force-pushed the dev/snoiry/mppa_target branch from bfb8226 to 0910617 Compare February 16, 2026 12:43

ElectrikSpace force-pushed the dev/snoiry/mppa_target branch 2 times, most recently from 6f1ffe9 to eaac320 Compare February 19, 2026 13:34

ElectrikSpace mentioned this pull request Feb 19, 2026

[runtime/mppa] Add Support for Mppa PMU counters #53

Open

ElectrikSpace self-assigned this Feb 19, 2026

ElectrikSpace force-pushed the dev/snoiry/mppa_target branch from eaac320 to 4605de2 Compare February 19, 2026 14:24

ElectrikSpace requested a review from guillon February 19, 2026 14:31

ElectrikSpace force-pushed the dev/snoiry/mppa_target branch from 4605de2 to 54ab8a3 Compare February 20, 2026 11:27

This was referenced Feb 24, 2026

Add Amd AIE target #56

Draft

[gpu] Support AOT offloading of tensors #57

Open

ElectrikSpace force-pushed the dev/snoiry/mppa_target branch 2 times, most recently from 759afa8 to 74f3b3b Compare February 26, 2026 16:58

guillon added the enhancement New feature or request label Feb 27, 2026

ElectrikSpace added 5 commits February 27, 2026 13:44

[Mppa] Delay vectorization to the lowering pipeline

43d5d1f

Rely on the kvxuks-catch pass to catch micro-kernels in replacement of the transform dialect based vectorization.

ElectrikSpace force-pushed the dev/snoiry/mppa_target branch from 74f3b3b to 43d5d1f Compare February 27, 2026 12:54

guillon assigned guillon and unassigned ElectrikSpace Mar 3, 2026

guillon requested changes Mar 3, 2026

View reviewed changes

guillon assigned ElectrikSpace Mar 3, 2026

guillon removed their assignment Mar 3, 2026

		from xtc.itf.runtime.common import CommonRuntimeInterface


		class EmbeddedDevice(CommonRuntimeInterface, ABC):

+                      """
+                      ...
+                  # TODO



		def np_init(shape: tuple, dtype: str) -> numpy.typing.NDArray[Any]:
		def np_init(shape: tuple, dtype: str, **attrs: Any) -> numpy.typing.NDArray[Any]:

Conversation

ElectrikSpace commented Feb 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ElectrikSpace commented Feb 13, 2026

Uh oh!

ElectrikSpace commented Feb 16, 2026

Uh oh!

guillon commented Mar 2, 2026

Uh oh!

ElectrikSpace commented Mar 3, 2026

Uh oh!

guillon left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ElectrikSpace commented Feb 12, 2026 •

edited

Loading