Skip to content

Conversation

@KohakuBlueleaf
Copy link
Contributor

In this PR I proposed a mechanism for resolution bucketing. Unlike standard Aspect Ratio Bucketing, we allow user to input arbitrary resolution latents, we directly do the bucketing on the list of latents and assume they already have user expected size.
(In this PR we also added "ResizeToPixelCount" node which can mimic the effect of standard ARB)

Beside Resolution bucketing things, we also fixed the issue in #10940 which lack some data moving cause bad tensor device.

And about Trainer refactoring, we are now split each task (like create adapter, training step in different modes) into separated functions to improve maintainability

We also used custom TrainGuider with modified load model helper to allow custom control on the loading behavior.

TL;DR:

@MeiYi-dev
Copy link

MeiYi-dev commented Dec 5, 2025

I don't know where to ask this, but does your lora trainer support every model that ComfyUI supports? Also, can you please include optimizers from this project https://pytorch-optimizers.readthedocs.io/en/latest/? (some optimizers here greatly reduce the VRAM consumption.)

@KohakuBlueleaf
Copy link
Contributor Author

KohakuBlueleaf commented Dec 5, 2025

I don't know where to ask this, but does your lora trainer support every model that ComfyUI supports? Also, can you please include optimizers from this project https://pytorch-optimizers.readthedocs.io/en/latest/? (some optimizers here greatly reduce the VRAM consumption.)

  1. I can't personally ensure all ComfyUI supported model is supported in lora trainer, but as tested before it should be "yes" if you are training for image. (Some video model support image gen, like wan2.1 or 2.2, and those 2 are also supported in lora trainer as well)
  • if you meet any problem in lora training on some model, you can directly open issue and ping me.
  1. I can provide a template for custom optimizer support by extension, or you can open PR for the optimizers you want to use. But I need to inform you, in lora training, optimizer and lora occupy way way way less vram than the model itself, change optimizer usually help nothing.

@MeiYi-dev
Copy link

MeiYi-dev commented Dec 5, 2025

I don't know where to ask this, but does your lora trainer support every model that ComfyUI supports? Also, can you please include optimizers from this project https://pytorch-optimizers.readthedocs.io/en/latest/? (some optimizers here greatly reduce the VRAM consumption.)

  1. I can't personally ensure all ComfyUI supported model is supported in lora trainer, but as tested before it should be "yes" if you are training for image. (Some video model support image gen, like wan2.1 or 2.2, and those 2 are also supported in lora trainer as well)
  • if you meet any problem in lora training on some model, you can directly open issue and ping me.
  1. I can provide a template for custom optimizer support by extension, or you can open PR for the optimizers you want to use. But I need to inform you, in lora training, optimizer and lora occupy way way way less vram than the model itself, change optimizer usually help nothing.

Thank you for the answers. I have some more questions/requests for LORA training in ComfyUI.

For 1: The video models do support image training as well, though that's almost useless since video models are used mostly for animations. If training with videos gets implemented, I would even consider dropping support musubi if video training gets supported.

For 2: Some optimizers are "schedule free" and some optimizers don't require any hyperparameters, so extra optimizers would be so nice to have access to.

As for the new requests, there are some training optimizations that GREATLY improve LORA training speed both in terms of iteration speed and convergence time like:

The relatively simple to implement:
https://github.com/compvis/tread | https://github.com/vita-epfl/LayerSync | https://github.com/vvvvvjdy/SRA

Slightly harder to implement:
https://github.com/Martinser/REG

There are some more mentioned here https://x.com/SwayStar123/status/1994673352754270318

Even just TREAD which seems to be the easiest to implement and is tested for LORA training will GREATLY improve convergence and iteration speed.
It would be awesome if some of these get implemented.

@KohakuBlueleaf
Copy link
Contributor Author

I don't know where to ask this, but does your lora trainer support every model that ComfyUI supports? Also, can you please include optimizers from this project https://pytorch-optimizers.readthedocs.io/en/latest/? (some optimizers here greatly reduce the VRAM consumption.)

  1. I can't personally ensure all ComfyUI supported model is supported in lora trainer, but as tested before it should be "yes" if you are training for image. (Some video model support image gen, like wan2.1 or 2.2, and those 2 are also supported in lora trainer as well)
  • if you meet any problem in lora training on some model, you can directly open issue and ping me.
  1. I can provide a template for custom optimizer support by extension, or you can open PR for the optimizers you want to use. But I need to inform you, in lora training, optimizer and lora occupy way way way less vram than the model itself, change optimizer usually help nothing.

Thank you for the answers. I have some more questions/requests for LORA training in ComfyUI.

For 1: The video models do support image training as well, though that's almost useless since video models are used mostly for animations. If training with videos gets implemented, I would even consider dropping support musubi if video training gets supported.

For 2: Some optimizers are "schedule free" and some optimizers don't require any hyperparameters, so extra optimizers would be so nice to have access to.

As for the new requests, there are some training optimizations that GREATLY improve LORA training speed both in terms of iteration speed and convergence time like:

The relatively simple to implement: https://github.com/compvis/tread | https://github.com/vita-epfl/LayerSync | https://github.com/vvvvvjdy/SRA

Slightly harder to implement: https://github.com/Martinser/REG

There are some more mentioned here https://x.com/SwayStar123/status/1994673352754270318

Even just TREAD which seems to be the easiest to implement will GREATLY improve convergence and iteration speed. It would be awesome if some of these get implemented.

Plz open issue/discussions or PR for your request, plz don't keep posting things unrelated to this PR/thread
Thanks

@bezo97
Copy link
Contributor

bezo97 commented Dec 8, 2025

Hi, I tested this PR, here's my feedback:

  • The mentined bug is fixed, I can now start the training 👍
  • I tried the new resolution bucketing feature, it works well 👍
  • Found a bug, I can't tell whether its introduced in this PR or earlier so here it is.. Disabling gradient_checkpointing causes the training steps to complete very fast without the lora actually being trained, also no errors etc. The steps are iterated but there's no gpu utilization. I can file a separate Issue if this is unrelated to the PR (please verify)

@KohakuBlueleaf
Copy link
Contributor Author

Hi, I tested this PR, here's my feedback:

  • The mentined bug is fixed, I can now start the training 👍
  • I tried the new resolution bucketing feature, it works well 👍
  • Found a bug, I can't tell whether its introduced in this PR or earlier so here it is.. Disabling gradient_checkpointing causes the training steps to complete very fast without the lora actually being trained, also no errors etc. The steps are iterated but there's no gpu utilization. I can file a separate Issue if this is unrelated to the PR (please verify)

Will check this issue soon

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants