[CUDA][FFI] Extend kernel launch config to support Programmatic Dependent Launch and cuLaunchCooperativeKernel #18604

w169q169 · 2025-12-23T09:30:57Z

This patch adds support for Programmatic Dependent Kernel Launch (PDL) in the TVM CUDA FFI layer. PDL enables launching dependent kernels on the GPU without host intervention, improving performance and expressiveness for dynamic CUDA workloads.

Refer to NVIDIA documentation for PDL semantics:
https://docs.nvidia.com/cuda/cuda-programming-guide/03-advanced/advanced-host-programming.html#programmatic-dependent-kernel-launch

In addition, this patch extends the CUDA FFI layer to support cooperative kernel launches via cuLaunchCooperativeKernel. Cooperative kernels allow grid-wide synchronization and are required for certain multi-stage or producer–consumer GPU workloads. When a kernel is marked for cooperative launch, it will be dispatched using cuLaunchCooperativeKernel instead of the standard cuLaunchKernel.

Refer to NVIDIA documentation for cooperative kernel usage:
https://docs.nvidia.com/cuda/cuda-programming-guide/04-special-topics/cooperative-groups.html#when-to-use-cudalaunchcooperativekernel

…) in TVM CUDA FFI

gemini-code-assist · 2025-12-23T09:31:14Z

Summary of Changes

Hello @w169q169, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request integrates support for Programmatic Dependent Kernel Launch (PDL) into the TVM CUDA FFI. This enhancement is designed to improve the performance and expressiveness of dynamic CUDA workloads by allowing GPU kernels to directly launch subsequent kernels, thereby reducing the overhead associated with host CPU synchronization. The changes involve modifying the CUDA runtime to recognize and act upon a new flag in function metadata, which then triggers the use of advanced CUDA kernel launch APIs.

Highlights

Programmatic Dependent Kernel Launch (PDL) Support: Added core support for Programmatic Dependent Kernel Launch (PDL) within the TVM CUDA FFI layer, enabling GPU kernels to launch other kernels without host intervention.
Conditional Kernel Launch Mechanism: Implemented a conditional kernel launch mechanism that utilizes cuLaunchKernelEx with CUlaunchConfig and CUlaunchAttribute when PDL is enabled, specifically setting CU_LAUNCH_ATTRIBUTE_PROGRAMMATIC_STREAM_SERIALIZATION.
Function Metadata Extension: Extended the FunctionInfo metadata structure to include a has_programmatic_dependent_launch boolean flag, which is persisted during serialization and deserialization for both JSON and binary formats.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request adds support for Programmatic Dependent Kernel Launch (PDL) in the CUDA FFI, which is a great feature for dynamic workloads. The implementation correctly uses cuLaunchKernelEx when the feature is enabled for a function.

My review focuses on improving robustness and ensuring backward compatibility. I've pointed out a couple of areas for improvement:

In cuda_module.cc, I suggest zero-initializing CUDA launch configuration structs to prevent potential issues with uninitialized data.
In meta_data.h and file_utils.cc, I've identified issues with serialization that could break backward compatibility. Specifically, the new has_programmatic_dependent_launch flag needs to be handled correctly when loading older module metadata, both for JSON and binary formats. The binary format handling, in particular, seems to have a breaking change.

Please take a look at the detailed comments. Addressing these points will make the changes more robust and maintainable.

src/runtime/file_utils.cc

src/runtime/cuda/cuda_module.cc

src/runtime/meta_data.h

src/runtime/cuda/cuda_module.cc

tqchen · 2025-12-23T12:37:52Z

thanks a lot for the contribution, to align with existing impl, we can use launch param tag, see LaunchParamConfig implementation. The launch param tag can be specified as part of tir.kernel_launch_params, which then set the LaunchPramConfig setting

we can use

constexpr const char* kUseProgramaticDependentLaunch = "tir.use_programtic_dependent_launch";

add a field use_programtic_dependent_launch_ to field to LaunchConfig
expose a function use_programtic_dependent_launch() which can be queried in LaunchConfig

This way we can directly specify pdl properties in the tvmscript as kernel function attribute

w169q169 · 2025-12-24T07:01:40Z

thanks a lot for the contribution, to align with existing impl, we can use launch param tag, see LaunchParamConfig implementation. The launch param tag can be specified as part of tir.kernel_launch_params, which then set the LaunchPramConfig setting

we can use

constexpr const char* kUseProgramaticDependentLaunch = "tir.use_programtic_dependent_launch";

add a field use_programtic_dependent_launch_ to field to LaunchConfig

expose a function use_programtic_dependent_launch() which can be queried in LaunchConfig

This way we can directly specify pdl properties in the tvmscript as kernel function attribute

Thanks for the detailed guidance. I have updated the PR according to your suggestions. Please let me know if there are any further comments or improvements needed.

silentCoder-dev · 2025-12-24T08:42:16Z

Could you extend this implementation to include cuLaunchCooperativeKernel support? This is needed for issue tile-ai/tilelang/issues/1501

w169q169 · 2025-12-24T08:53:47Z

Could you extend this implementation to include cuLaunchCooperativeKernel support? This is needed for issue tile-ai/tilelang/issues/1501

OK, I’d be happy to work on it.

[CUDA][FFI] Add support for Programmatic Dependent Kernel Launch (PDL…

3a10765

…) in TVM CUDA FFI

gemini-code-assist bot reviewed Dec 23, 2025

View reviewed changes

src/runtime/file_utils.cc Outdated Show resolved Hide resolved

src/runtime/cuda/cuda_module.cc Outdated Show resolved Hide resolved

src/runtime/meta_data.h Outdated Show resolved Hide resolved

w169q169 mentioned this pull request Dec 23, 2025

[Feat] PDL Support tile-ai/tilelang#1494

Open

tqchen requested changes Dec 23, 2025

View reviewed changes

src/runtime/cuda/cuda_module.cc Outdated Show resolved Hide resolved

tir: add launch param tag for programmatic dependent launch

c3d28b2

w169q169 marked this pull request as ready for review December 24, 2025 03:26

silentCoder-dev mentioned this pull request Dec 24, 2025

[BUG] Unspecified launch failure when running persistent kernel tile-ai/tilelang#1501

Open

2 tasks

tir: add param tag for cuLaunchCooperativeKernel

490f6a0

w169q169 changed the title ~~[CUDA][FFI] Add support for Programmatic Dependent Kernel Launch (PDL) in TVM CUDA FFI~~ [CUDA][FFI] Extend kernel launch config to support Programmatic Dependent Launch and cuLaunchCooperativeKernel Dec 24, 2025

tqchen approved these changes Dec 24, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[CUDA][FFI] Extend kernel launch config to support Programmatic Dependent Launch and cuLaunchCooperativeKernel #18604

[CUDA][FFI] Extend kernel launch config to support Programmatic Dependent Launch and cuLaunchCooperativeKernel #18604

w169q169 commented Dec 23, 2025 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Dec 23, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tqchen commented Dec 23, 2025 •

edited

Loading

Uh oh!

w169q169 commented Dec 24, 2025

Uh oh!

silentCoder-dev commented Dec 24, 2025 •

edited

Loading

Uh oh!

w169q169 commented Dec 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[CUDA][FFI] Extend kernel launch config to support Programmatic Dependent Launch and cuLaunchCooperativeKernel #18604

Are you sure you want to change the base?

[CUDA][FFI] Extend kernel launch config to support Programmatic Dependent Launch and cuLaunchCooperativeKernel #18604

Conversation

w169q169 commented Dec 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot commented Dec 23, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tqchen commented Dec 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

w169q169 commented Dec 24, 2025

Uh oh!

silentCoder-dev commented Dec 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

w169q169 commented Dec 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

w169q169 commented Dec 23, 2025 •

edited

Loading

tqchen commented Dec 23, 2025 •

edited

Loading

silentCoder-dev commented Dec 24, 2025 •

edited

Loading