Avoid copying output from GPU to CPU for ASR runner #16235

larryliu0820 · 2025-12-12T23:34:08Z

Summary

This PR adds the ability to skip copying GPU outputs back to CPU for specific methods in the CUDA backend, and enables conditional CUDA compilation for the ASR runner.

Changes

CUDA Backend (`backends/cuda/runtime/cuda_backend.cpp`)

Changed the skip_copy_output_to_cpu_for_method backend option from a boolean to a string that accepts a method name
The option can be set via set_option() after init() is called, allowing runtime configuration
During execute(), the backend compares the configured method name against the handle's method name to decide whether to skip the GPU→CPU output copy
Thread-safe access to the skip-copy method name via mutex

Usage example:

BackendOptions<1> options;
options.set_option("skip_copy_output_to_cpu_for_method", "encode");
set_option("CudaBackend", options.view());

AOTI Delegate Handle (`backends/aoti/aoti_delegate_handle.h`)

Added method_name field to AOTIDelegateHandle to track which method each delegate handle corresponds to

ASR Runner CMake (`extension/asr/runner/CMakeLists.txt`)

Added conditional CUDA support: when EXECUTORCH_BUILD_CUDA is enabled and CUDAToolkit is found, the CUDA_AVAILABLE compile definition is added
This allows ASR runner code to conditionally compile CUDA-aware code paths

Motivation

When running multi-method models (e.g., prefill + decode for LLMs, or encoder + decoder for ASR), some methods benefit from keeping outputs on GPU to avoid unnecessary memory copies between methods. This change enables fine-grained control over which method(s) skip the GPU→CPU copy.

Test Plan

Build with CUDA enabled
Verify skip_copy_output_to_cpu_for_method option works for specified method
Verify other methods still copy outputs to CPU by default

pytorch-bot · 2025-12-12T23:34:12Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/16235

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

Can't find 'action.yml', 'action.yaml' or 'Dockerfile' under '/home/ec2-user/actions-runner/_work/pytorch/pytorch/.github/actions/check-tpu'

✅ No Failures

As of commit aa293f9 with merge base 33ec615 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Dec 12, 2025

larryliu0820 force-pushed the avoid_copy_output branch 2 times, most recently from f3e30a4 to d1df034 Compare December 13, 2025 00:09

larryliu0820 added the release notes: desktop for desktop/laptop workstream label Dec 13, 2025

larryliu0820 force-pushed the avoid_copy_output branch from d1df034 to 1b3c35c Compare December 13, 2025 00:17

larryliu0820 changed the title ~~Avoid copying output from GPU to CPU~~ Avoid copying output from GPU to CPU for ASR runner Dec 13, 2025

larryliu0820 marked this pull request as ready for review December 13, 2025 00:22

larryliu0820 requested a review from kirklandsign as a code owner December 13, 2025 00:22

larryliu0820 temporarily deployed to upload-benchmark-results December 13, 2025 01:25 — with GitHub Actions Inactive

Avoid copying output from GPU to CPU

aa293f9

larryliu0820 force-pushed the avoid_copy_output branch from 1b3c35c to aa293f9 Compare December 13, 2025 23:35

larryliu0820 temporarily deployed to upload-benchmark-results December 14, 2025 01:04 — with GitHub Actions Inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Avoid copying output from GPU to CPU for ASR runner #16235

Avoid copying output from GPU to CPU for ASR runner #16235

larryliu0820 commented Dec 12, 2025 •

edited

Loading

Uh oh!

pytorch-bot bot commented Dec 12, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Avoid copying output from GPU to CPU for ASR runner #16235

Are you sure you want to change the base?

Avoid copying output from GPU to CPU for ASR runner #16235

Conversation

larryliu0820 commented Dec 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

CUDA Backend (backends/cuda/runtime/cuda_backend.cpp)

AOTI Delegate Handle (backends/aoti/aoti_delegate_handle.h)

ASR Runner CMake (extension/asr/runner/CMakeLists.txt)

Motivation

Test Plan

Uh oh!

pytorch-bot bot commented Dec 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/16235

❗ 1 Active SEVs

✅ No Failures

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

larryliu0820 commented Dec 12, 2025 •

edited

Loading

CUDA Backend (`backends/cuda/runtime/cuda_backend.cpp`)

AOTI Delegate Handle (`backends/aoti/aoti_delegate_handle.h`)

ASR Runner CMake (`extension/asr/runner/CMakeLists.txt`)

pytorch-bot bot commented Dec 12, 2025 •

edited

Loading