-
Notifications
You must be signed in to change notification settings - Fork 10.9k
retune lowVramPatch VRAM accounting #11173
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
retune lowVramPatch VRAM accounting #11173
Conversation
In the lowvram case, this now does its math in the model dtype in the post de-quantization domain. Account for that. The patching was also put back on the compute stream getting it off-peak so relax the MATH_FACTOR to only x2 so get out of the worst-case assumption of everything peaking at once.
|
@rattus128 Something broke (at least with Qwen Image Edit). If I add a LoRA, it throws this error. ComfyUI Error ReportError Details
Stack TraceSystem Information
Devices
Logs |
| if weight is None: | ||
| return 0 | ||
| return weight.numel() * torch.float32.itemsize * LOWVRAM_PATCH_ESTIMATE_MATH_FACTOR | ||
| model_dtype = getattr(model, "manual_cast_dtype", torch.float32) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
manual_cast_dtype can be None, this code needs to either exclude that case or have an or torch.float32 on the end (getattr default only fires if the attr doesn't exist, not if it's None, so this PR is erroring for users)
In the lowvram case, this now does its math in the model dtype post de-quantization. Account for that. The patching was also put back on the compute stream getting it off-peak so relax the MATH_FACTOR to only x2 so get out of the worst-case assumption of everything peaking at once.
RTX3060, flux2 fp8 with Lora:
Before:
After: