Resolution bucketing and Trainer implementation refactoring #11117

KohakuBlueleaf · 2025-12-05T09:31:30Z

In this PR I proposed a mechanism for resolution bucketing. Unlike standard Aspect Ratio Bucketing, we allow user to input arbitrary resolution latents, we directly do the bucketing on the list of latents and assume they already have user expected size.
(In this PR we also added "ResizeToPixelCount" node which can mimic the effect of standard ARB)

Beside Resolution bucketing things, we also fixed the issue in #10940 which lack some data moving cause bad tensor device.

And about Trainer refactoring, we are now split each task (like create adapter, training step in different modes) into separated functions to improve maintainability

We also used custom TrainGuider with modified load model helper to allow custom control on the loading behavior.

TL;DR:

New Feature
- Resolution Bucketing
- Better resize node
Bug fixes
- Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (TrainLoraNode) #10940
Others
- refactoring for maintainability

MeiYi-dev · 2025-12-05T10:42:36Z

I don't know where to ask this, but does your lora trainer support every model that ComfyUI supports? Also, can you please include optimizers from this project https://pytorch-optimizers.readthedocs.io/en/latest/? (some optimizers here greatly reduce the VRAM consumption.)

KohakuBlueleaf · 2025-12-05T12:47:09Z

I don't know where to ask this, but does your lora trainer support every model that ComfyUI supports? Also, can you please include optimizers from this project https://pytorch-optimizers.readthedocs.io/en/latest/? (some optimizers here greatly reduce the VRAM consumption.)

I can't personally ensure all ComfyUI supported model is supported in lora trainer, but as tested before it should be "yes" if you are training for image. (Some video model support image gen, like wan2.1 or 2.2, and those 2 are also supported in lora trainer as well)

if you meet any problem in lora training on some model, you can directly open issue and ping me.

I can provide a template for custom optimizer support by extension, or you can open PR for the optimizers you want to use. But I need to inform you, in lora training, optimizer and lora occupy way way way less vram than the model itself, change optimizer usually help nothing.

MeiYi-dev · 2025-12-05T13:47:26Z

I don't know where to ask this, but does your lora trainer support every model that ComfyUI supports? Also, can you please include optimizers from this project https://pytorch-optimizers.readthedocs.io/en/latest/? (some optimizers here greatly reduce the VRAM consumption.)

I can't personally ensure all ComfyUI supported model is supported in lora trainer, but as tested before it should be "yes" if you are training for image. (Some video model support image gen, like wan2.1 or 2.2, and those 2 are also supported in lora trainer as well)

if you meet any problem in lora training on some model, you can directly open issue and ping me.

I can provide a template for custom optimizer support by extension, or you can open PR for the optimizers you want to use. But I need to inform you, in lora training, optimizer and lora occupy way way way less vram than the model itself, change optimizer usually help nothing.

Thank you for the answers. I have some more questions/requests for LORA training in ComfyUI.

For 1: The video models do support image training as well, though that's almost useless since video models are used mostly for animations. If training with videos gets implemented, I would even consider dropping support musubi if video training gets supported.

For 2: Some optimizers are "schedule free" and some optimizers don't require any hyperparameters, so extra optimizers would be so nice to have access to.

As for the new requests, there are some training optimizations that GREATLY improve LORA training speed both in terms of iteration speed and convergence time like:

The relatively simple to implement:
https://github.com/compvis/tread | https://github.com/vita-epfl/LayerSync | https://github.com/vvvvvjdy/SRA

Slightly harder to implement:
https://github.com/Martinser/REG

There are some more mentioned here https://x.com/SwayStar123/status/1994673352754270318

Even just TREAD which seems to be the easiest to implement and is tested for LORA training will GREATLY improve convergence and iteration speed.
It would be awesome if some of these get implemented.

KohakuBlueleaf · 2025-12-05T13:49:08Z

I don't know where to ask this, but does your lora trainer support every model that ComfyUI supports? Also, can you please include optimizers from this project https://pytorch-optimizers.readthedocs.io/en/latest/? (some optimizers here greatly reduce the VRAM consumption.)

I can't personally ensure all ComfyUI supported model is supported in lora trainer, but as tested before it should be "yes" if you are training for image. (Some video model support image gen, like wan2.1 or 2.2, and those 2 are also supported in lora trainer as well)

if you meet any problem in lora training on some model, you can directly open issue and ping me.

I can provide a template for custom optimizer support by extension, or you can open PR for the optimizers you want to use. But I need to inform you, in lora training, optimizer and lora occupy way way way less vram than the model itself, change optimizer usually help nothing.

Thank you for the answers. I have some more questions/requests for LORA training in ComfyUI.

For 1: The video models do support image training as well, though that's almost useless since video models are used mostly for animations. If training with videos gets implemented, I would even consider dropping support musubi if video training gets supported.

For 2: Some optimizers are "schedule free" and some optimizers don't require any hyperparameters, so extra optimizers would be so nice to have access to.

As for the new requests, there are some training optimizations that GREATLY improve LORA training speed both in terms of iteration speed and convergence time like:

The relatively simple to implement: https://github.com/compvis/tread | https://github.com/vita-epfl/LayerSync | https://github.com/vvvvvjdy/SRA

Slightly harder to implement: https://github.com/Martinser/REG

There are some more mentioned here https://x.com/SwayStar123/status/1994673352754270318

Even just TREAD which seems to be the easiest to implement will GREATLY improve convergence and iteration speed. It would be awesome if some of these get implemented.

Plz open issue/discussions or PR for your request, plz don't keep posting things unrelated to this PR/thread
Thanks

bezo97 · 2025-12-08T21:16:01Z

Hi, I tested this PR, here's my feedback:

The mentined bug is fixed, I can now start the training 👍
I tried the new resolution bucketing feature, it works well 👍
Found a bug, I can't tell whether its introduced in this PR or earlier so here it is.. Disabling gradient_checkpointing causes the training steps to complete very fast without the lora actually being trained, also no errors etc. The steps are iterated but there's no gpu utilization. I can file a separate Issue if this is unrelated to the PR (please verify)

KohakuBlueleaf · 2025-12-09T01:49:45Z

Hi, I tested this PR, here's my feedback:

The mentined bug is fixed, I can now start the training 👍

I tried the new resolution bucketing feature, it works well 👍

Found a bug, I can't tell whether its introduced in this PR or earlier so here it is.. Disabling gradient_checkpointing causes the training steps to complete very fast without the lora actually being trained, also no errors etc. The steps are iterated but there's no gpu utilization. I can file a separate Issue if this is unrelated to the PR (please verify)

Will check this issue soon

KohakuBlueleaf added 4 commits December 1, 2025 23:31

Add resolution bucketing

7a93c55

Refactoring with better layout for maintainability

bf573e9

Custom guider for correct offloading behavior

4004af3

Merge branch 'master' into resolution-bucket

37139da

KohakuBlueleaf requested review from Kosinkadink, comfyanonymous and guill as code owners December 5, 2025 09:31

KohakuBlueleaf mentioned this pull request Dec 5, 2025

Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (TrainLoraNode) #10940

Open

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Resolution bucketing and Trainer implementation refactoring #11117

Resolution bucketing and Trainer implementation refactoring #11117

KohakuBlueleaf commented Dec 5, 2025

Uh oh!

MeiYi-dev commented Dec 5, 2025 •

edited

Loading

Uh oh!

KohakuBlueleaf commented Dec 5, 2025 •

edited

Loading

Uh oh!

MeiYi-dev commented Dec 5, 2025 •

edited

Loading

Uh oh!

KohakuBlueleaf commented Dec 5, 2025

Uh oh!

bezo97 commented Dec 8, 2025

Uh oh!

KohakuBlueleaf commented Dec 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Resolution bucketing and Trainer implementation refactoring #11117

Are you sure you want to change the base?

Resolution bucketing and Trainer implementation refactoring #11117

Conversation

KohakuBlueleaf commented Dec 5, 2025

Uh oh!

MeiYi-dev commented Dec 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

KohakuBlueleaf commented Dec 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MeiYi-dev commented Dec 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

KohakuBlueleaf commented Dec 5, 2025

Uh oh!

bezo97 commented Dec 8, 2025

Uh oh!

KohakuBlueleaf commented Dec 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

MeiYi-dev commented Dec 5, 2025 •

edited

Loading

KohakuBlueleaf commented Dec 5, 2025 •

edited

Loading

MeiYi-dev commented Dec 5, 2025 •

edited

Loading