[Feat] Add support for distributed training#257
[Feat] Add support for distributed training#257tchaton wants to merge 11 commits intolearnables:masterfrom
Conversation
seba-1511
left a comment
There was a problem hiding this comment.
Thanks a lot for the PR @tchaton, and for the preliminary review @pietrolesci.
Some questions and minor remarks:
- Do those changes remove the strict requirement for pytorch_lightning==1.0.2? (If yes, to be changed in lightning_episodic_module.py:16).
- List contributions (crediting to yourself) in CHANGELOG.md?
|
|
||
| def train_dataloader(self): | ||
| return EpisodicBatcher.epochify( | ||
| return Epochifier( |
There was a problem hiding this comment.
Was Epochifier inadvertently removed? Or should it be replaced by TaskDataParallel?
Also, if I remember correctly Epochifier was preventing us from using pytorch_lightning > 1.0.2. I believe the fix was just to inherit from a data class in lightning. @nightlessbaron do you remember which class and if that was enough?
There was a problem hiding this comment.
@seba-1511, quick answer to your first question: Epochifier is replaced by TaskDataParallel
There was a problem hiding this comment.
@seba-1511, as @pietrolesci mentioned, in the benchmark, we replaced Epochifier with TaskDistributedParallel. The downside is windows users can't make use of it as stated in #5358.
I believe the fix was just to inherit from a data class in lightning.
I couldn't figure it out. That is still unresolved.
There was a problem hiding this comment.
Hey @seba-1511, yes it was inadvertently removed.
However, I added a TODO. It should be provided replaced by the TaskDataParallel, TaskDistributedDataParallel when tested on your side.
I will copy those files on the Flash side to no block the integration until the next learn2learn release.
Best,
T.C
Description
This PR adds support for computing support on new data for running inference. This is only supported by LightningPrototypicalNetworks right now, as the classifier needs re-fitting.
This is used for Lightning Flash learn2learn integration: Lightning-Universe/lightning-flash#737
Replace this line by a one sentence summary of your PR.
If necessary, use the following space to provide context or more details.
Contribution Checklist
If your contribution modifies code in the core library (not docs, tests, or examples), please fill the following checklist.
Optional
If you make major changes to the core library, please run
make alltestsand copy-paste the content ofalltests.txtbelow.