DDP for FineTuning#812
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces Distributed Data Parallel (DDP) support for fine-tuning, which is a great enhancement for multi-GPU training. The implementation is thorough, covering distributed sampling, metric synchronization, and efficient model state saving. I've found one area for improvement regarding optimizer initialization in the DDP setup for better consistency and adherence to best practices.
|
Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits. |
|
@psinger-prior sorry, I've completely missed this PR, probably due to the war. @anuragg1209 would you feel OK to review it? |
Hi @alanprior, yes, this PR review is under my To-Dos. Please feel free to unsubscribe. |
|
Hi @psinger-prior, I just have a few comments to add. |
anuragg1209
left a comment
There was a problem hiding this comment.
All comments addressed! LGTM! Thanks @psinger-prior for the PR!
Issue
Closing #809
Motivation and Context
Public API Changes
How Has This Been Tested?
Running example scripts on single and multi gpu nodes.
Checklist
changelog/README.md), or "no changelog needed" label requested.