-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[CUDA][FFI] Extend kernel launch config to support Programmatic Dependent Launch and cuLaunchCooperativeKernel #18604
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…) in TVM CUDA FFI
Summary of ChangesHello @w169q169, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request integrates support for Programmatic Dependent Kernel Launch (PDL) into the TVM CUDA FFI. This enhancement is designed to improve the performance and expressiveness of dynamic CUDA workloads by allowing GPU kernels to directly launch subsequent kernels, thereby reducing the overhead associated with host CPU synchronization. The changes involve modifying the CUDA runtime to recognize and act upon a new flag in function metadata, which then triggers the use of advanced CUDA kernel launch APIs. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request adds support for Programmatic Dependent Kernel Launch (PDL) in the CUDA FFI, which is a great feature for dynamic workloads. The implementation correctly uses cuLaunchKernelEx when the feature is enabled for a function.
My review focuses on improving robustness and ensuring backward compatibility. I've pointed out a couple of areas for improvement:
- In
cuda_module.cc, I suggest zero-initializing CUDA launch configuration structs to prevent potential issues with uninitialized data. - In
meta_data.handfile_utils.cc, I've identified issues with serialization that could break backward compatibility. Specifically, the newhas_programmatic_dependent_launchflag needs to be handled correctly when loading older module metadata, both for JSON and binary formats. The binary format handling, in particular, seems to have a breaking change.
Please take a look at the detailed comments. Addressing these points will make the changes more robust and maintainable.
|
thanks a lot for the contribution, to align with existing impl, we can use launch param tag, see LaunchParamConfig implementation. The launch param tag can be specified as part of we can use constexpr const char* kUseProgramaticDependentLaunch = "tir.use_programtic_dependent_launch";
This way we can directly specify pdl properties in the tvmscript as kernel function attribute |
Thanks for the detailed guidance. I have updated the PR according to your suggestions. Please let me know if there are any further comments or improvements needed. |
|
Could you extend this implementation to include |
OK, I’d be happy to work on it. |
This patch adds support for Programmatic Dependent Kernel Launch (PDL) in the TVM CUDA FFI layer. PDL enables launching dependent kernels on the GPU without host intervention, improving performance and expressiveness for dynamic CUDA workloads.
Refer to NVIDIA documentation for PDL semantics:
https://docs.nvidia.com/cuda/cuda-programming-guide/03-advanced/advanced-host-programming.html#programmatic-dependent-kernel-launch
In addition, this patch extends the CUDA FFI layer to support cooperative kernel launches via cuLaunchCooperativeKernel. Cooperative kernels allow grid-wide synchronization and are required for certain multi-stage or producer–consumer GPU workloads. When a kernel is marked for cooperative launch, it will be dispatched using cuLaunchCooperativeKernel instead of the standard cuLaunchKernel.
Refer to NVIDIA documentation for cooperative kernel usage:
https://docs.nvidia.com/cuda/cuda-programming-guide/04-special-topics/cooperative-groups.html#when-to-use-cudalaunchcooperativekernel