Skip to content

Conversation

@rattus128
Copy link
Contributor

This is the hopefully full root cause fix on:

#10891

Primary commit message:

commit 53bd09926cf0f680d0fd67afcb2d0a289d71940d
Author: Rattus <[email protected]>
Date:   Sun Dec 7 21:23:05 2025 +1000

    Account for dequantization and type-casts in offload costs
    
    When measuring the cost of offload, identify weights that need a type
    change or dequantization and add the size of the conversion result
    to the offload cost.
    
    This is mutually exclusive with lowvram patches which already has
    a large conservative estimate and wont overlap the dequant cost so
    dont double count.

Example Test case:

RTX3060 Flux2 workflow with ModelComputeDtype node set to fp32

Before:

Requested to load Flux2TEModel_
loaded partially; 10508.42 MB usable, 10025.59 MB loaded, 7155.01 MB offloaded, 480.00 MB buffer reserved, lowvram patches: 0
!!! Exception during processing !!! Allocation on device 
...
    x = self.mlp(x)
        ^^^^^^^^^^^
  File "/home/rattus/venv2/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/rattus/venv2/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/rattus/ComfyUI/comfy/text_encoders/llama.py", line 327, in forward
    return self.down_proj(self.activation(self.gate_proj(x)) * self.up_proj(x))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/rattus/venv2/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/rattus/venv2/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/rattus/ComfyUI/comfy/ops.py", line 608, in forward
    return self.forward_comfy_cast_weights(input, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/rattus/ComfyUI/comfy/ops.py", line 599, in forward_comfy_cast_weights
    weight, bias, offload_stream = cast_bias_weight(self, input, offloadable=True)
                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/rattus/ComfyUI/comfy/ops.py", line 127, in cast_bias_weight
    weight = weight.dequantize()
             ^^^^^^^^^^^^^^^^^^^
  File "/home/rattus/ComfyUI/comfy/quant_ops.py", line 197, in dequantize
    return LAYOUTS[self._layout_type].dequantize(self._qdata, **self._layout_params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/rattus/ComfyUI/comfy/quant_ops.py", line 431, in dequantize
    plain_tensor = torch.ops.aten._to_copy.default(qdata, dtype=orig_dtype)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/rattus/venv2/lib/python3.12/site-packages/torch/_ops.py", line 841, in __call__
    return self._op(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
torch.OutOfMemoryError: Allocation on device 

Got an OOM, unloading all loaded models.
Prompt executed in 7.47 seconds

After:

image

Handle the case where the attribute doesnt exist by returning a static
sentinel (distinct from None). If the sentinel is passed in as the set
value, del the attr.
When measuring the cost of offload, identify weights that need a type
change or dequantization and add the size of the conversion result
to the offload cost.

This is mutually exclusive with lowvram patches which already has
a large conservative estimate and wont overlap the dequant cost so\
dont double count.
So that the loader can know the size of weights for dequant accounting.
@Balladie
Copy link
Contributor

Balladie commented Dec 7, 2025

Confirmed that it resolves #10891 (comment).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants