Commit 60ee574
authored
retune lowVramPatch VRAM accounting (#11173)
In the lowvram case, this now does its math in the model dtype in the
post de-quantization domain. Account for that. The patching was also
put back on the compute stream getting it off-peak so relax the
MATH_FACTOR to only x2 so get out of the worst-case assumption of
everything peaking at once.1 parent 8e889c5 commit 60ee574
1 file changed
+3
-3
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
132 | 132 | | |
133 | 133 | | |
134 | 134 | | |
135 | | - | |
136 | | - | |
| 135 | + | |
137 | 136 | | |
138 | 137 | | |
139 | 138 | | |
140 | 139 | | |
141 | 140 | | |
142 | | - | |
| 141 | + | |
| 142 | + | |
143 | 143 | | |
144 | 144 | | |
145 | 145 | | |
| |||
0 commit comments