Add QNN Stable Diffusion intermediate diagnostics#154
Draft
okikankyo wants to merge 4 commits into
Draft
Conversation
|
Thanks for the findings. I agree that it's most likely in the unet. I'm going to do on-device test to get to the bottom of this. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds Snapdragon X Elite / Windows ARM64 / Qualcomm QNN diagnostic tooling for the v0.48.0 Stable Diffusion demo path without modifying the existing
demo.py.This PR is a cause report rather than a fix. The main finding is that QNN UNet text-conditioning sensitivity is much weaker than a PyTorch UNet baseline under the same inputs.
Reproduction Conditions
qai_hub_models==0.48.0,onnxruntime-qnn==1.24.1v0.56.0was not useddemo.pywas not modifiedA girl taking a walk at sunset475801precompiled_qnn_onnx/w8a16context-wrapper filesWhat Changed
text_embquantization instrumentation around the installedOnnxModelTorchWrapper._prepare_inputs()behavior.0, 1, 3, 7.5, 15, 30using the same seed and initial latent.outputs/intermediate_debug/.Commands Run
CPU comparison against the available ONNX files is documented as skipped because those files are precompiled QNN context-wrapper ONNX assets, not portable CPU ONNX baselines.
Comparison Method
The QNN UNet and a PyTorch
UNet2DConditionModelbaseline fromsd2-community/stable-diffusion-2-1were run with the same:text_embThe comparison measured conditional vs unconditional
noise_preddelta from each UNet path.Numerical Result
noise_cond_minus_uncondstd0.00911555510.175066230.05206918The QNN UNet conditioning response is only about
5.2%of the PyTorch reference response under the same text embedding, latent, and timestep.Causes That Look Unlikely
text_embquantization loss: cond/uncond differences survive float32 to uint16 conversion and dequantization.0to30changesnoise_pred, final latent, and final image.Remaining Primary Suspect
The primary suspect is weak text-conditioning sensitivity inside the QNN UNet context path, especially:
unet_qairt_context.binQuestions For Qualcomm
unet_qairt_context.binexpected to preserve Stable Diffusion v2.1 cross-attention sensitivity at roughly the same magnitude as the PyTorch UNet baseline?precompiled_qnn_onnx/w8a16UNet context for Stable Diffusion v2.1 where text conditioning becomes weak while noise output remains active?unet_qairt_context.bin?text_embbe supplied as UINT16 using scale0.00034632044844329357and zero point23638, or is there any additional preprocessing expected by the QNN context?Key Files
stable_diffusion_windows_py/diagnose_intermediate_qnn.pystable_diffusion_windows_py/diagnose_guidance_sensitivity_qnn.pystable_diffusion_windows_py/compare_qnn_unet_reference.pystable_diffusion_windows_py/outputs/intermediate_debug/report.mdstable_diffusion_windows_py/outputs/intermediate_debug/unet_reference_compare.md