Skip to content

Conversation

@larryliu0820
Copy link
Contributor

@larryliu0820 larryliu0820 commented Dec 12, 2025

Summary

This PR adds the ability to skip copying GPU outputs back to CPU for specific methods in the CUDA backend, and enables conditional CUDA compilation for the ASR runner.

Changes

CUDA Backend (backends/cuda/runtime/cuda_backend.cpp)

  • Changed the skip_copy_output_to_cpu_for_method backend option from a boolean to a string that accepts a method name
  • The option can be set via set_option() after init() is called, allowing runtime configuration
  • During execute(), the backend compares the configured method name against the handle's method name to decide whether to skip the GPU→CPU output copy
  • Thread-safe access to the skip-copy method name via mutex

Usage example:

BackendOptions<1> options;
options.set_option("skip_copy_output_to_cpu_for_method", "encode");
set_option("CudaBackend", options.view());

AOTI Delegate Handle (backends/aoti/aoti_delegate_handle.h)

  • Added method_name field to AOTIDelegateHandle to track which method each delegate handle corresponds to

ASR Runner CMake (extension/asr/runner/CMakeLists.txt)

  • Added conditional CUDA support: when EXECUTORCH_BUILD_CUDA is enabled and CUDAToolkit is found, the CUDA_AVAILABLE compile definition is added
  • This allows ASR runner code to conditionally compile CUDA-aware code paths

Motivation

When running multi-method models (e.g., prefill + decode for LLMs, or encoder + decoder for ASR), some methods benefit from keeping outputs on GPU to avoid unnecessary memory copies between methods. This change enables fine-grained control over which method(s) skip the GPU→CPU copy.

Test Plan

  • Build with CUDA enabled
  • Verify skip_copy_output_to_cpu_for_method option works for specified method
  • Verify other methods still copy outputs to CPU by default

@pytorch-bot
Copy link

pytorch-bot bot commented Dec 12, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/16235

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

✅ No Failures

As of commit aa293f9 with merge base 33ec615 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Dec 12, 2025
@larryliu0820 larryliu0820 force-pushed the avoid_copy_output branch 2 times, most recently from f3e30a4 to d1df034 Compare December 13, 2025 00:09
@larryliu0820 larryliu0820 added the release notes: desktop for desktop/laptop workstream label Dec 13, 2025
@larryliu0820 larryliu0820 changed the title Avoid copying output from GPU to CPU Avoid copying output from GPU to CPU for ASR runner Dec 13, 2025
@larryliu0820 larryliu0820 marked this pull request as ready for review December 13, 2025 00:22
@larryliu0820 larryliu0820 temporarily deployed to upload-benchmark-results December 13, 2025 01:25 — with GitHub Actions Inactive
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. release notes: desktop for desktop/laptop workstream

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants