Skip to content

Shard datasets during training so that RAM usage is independent of world size#18

Open
HastingsGreer wants to merge 51 commits into
HastingsGreer:masterfrom
uncbiag:shard-datasets
Open

Shard datasets during training so that RAM usage is independent of world size#18
HastingsGreer wants to merge 51 commits into
HastingsGreer:masterfrom
uncbiag:shard-datasets

Conversation

@HastingsGreer

Copy link
Copy Markdown
Owner

No description provided.

HastingsGreer and others added 30 commits September 23, 2022 17:49
* asdf

* beginto  harmonize training proceedure

* final training pipe

* Rename preprocess_train_knees.py to preprocess_train_fullres_knees.py

* Create preprocess_train_halfres_knees.py

* details

* brain unstripped

* update old eval notebook

Co-authored-by: Thomas Greer <tgreer@biag-w05.cs.unc.edu>
* cvpr start

* Update TODO.md

* Update TODO.md

* Update TODO.md

* Update TODO.md

* Update and rename TODO.md to README.md

* OAI eval script done

* brain training script- testing if both steps in one script works

* ugh

* visualize training

* training?

* Update cvpr_network.py

* don't say validation

* normal log freq

* Add train config for lung dataset.

* fix shuffle name

* HCP eval

* progress

* Adding support for different dimensions.

* Fix the error " CPU tensor cannot be gathered" when using flips() for 2D and 1D data on 4 GPUS.

* 1. Fix bug - GradientICONSparse running error when applied to 2D images.
2. Add framework parameter to train_two_stage so that we can run the training process on ICON as well.

* Add ablation study script for comparing the training on different resolutions.

* fix preprocess

* script for COPDGene_eval

* batch size + switching images

* HCP ants eval

* COPDGene_eval.py

* OAI_ants_eval_needs_work

* Get code for bending energy or velocity field ablation into the cvpr branch (#50)

* Update network_wrappers.py

* Update network_wrappers.py

* Update network_wrappers.py

* Update networks.py

* Update network_wrappers.py

* Update train.py

* Update losses.py

* name mistake

* crimes

* explain bizzare code in comment

* Update network_wrappers.py

* Update network_wrappers.py

* Update network_wrappers.py

* Update network_wrappers.py

* Add learn2reg abdomenCTCT and NLST dataset helper function to icon data script.

* Add train script for the learn2reg AbdomenCTCT registration task and NLST task.

* Add copdgene train set to data script.

* Add train script for network capacity and network structure ablation study.

* evaluate OAI at half resolution to match prior work

* Add experiment script for comparing convergence speed between icon and gradICON.

* Add option to flips function so that it could print foldings in percentage.

* Add evaluation script for learn2reg abdomenCTCT registration.

* Fix the bug when normalize the intensity to [0,1].

* Fix bug: footsteps is initialized twice. Because utils initializes footsteps when imported.

* Add evaluation script for learn2reg NLST task.

* training scripts for abdomen and learn2reg lung

* abdomen eval fixes

* Add support for specifying output folder via argument list in network structure ablation study scripts.

* folds

* update comparison regularizers

* Update losses.py (#51)

* OAI_eval with torch.grid_sample

* real grid_sample test

* chunkin along

* Asdfafdsa

* synthmorph

* synthmorph

* Add HCP evaluation script for synthmorph.

* Add folding computation into the script.

* Fix Bending Energy.

* Experiment of comparing regularizers with varying lambdas.

* Plotting convergence speed comparison between ICON and GradICON.

* Update requirements.txt

* Update setup.cfg

* Add model statistics computation.

* Clean up the SynthMorph evaluation code.

* Add test code for SynthMorph evaluation code.

* Clean up the notebooks of the varying lambda experiments.

* Add the reason of having a copy of VM UNet to the comments.

* Add description of how to run the model statistics computation script.

* Change the required itk version to 5.3.0

* Add the pretrained models to package.

* Add test script for brain registration.

* Fix bug: Should have used pre-trained model used in test_brain_itk.

* Lossen the test criteria for brain registration so that the test case can pass when ran on cpu.

* Fix the comments so that sphinx can generate documentation.

* Unify the output of flips() function. Now the output should be a detached tensor.

Co-authored-by: Lin Tian <lintian@cs.unc.edu>
Co-authored-by: Raul <sonic1sonic@gmail.com>
* make sure we aren't scaling a signed short image to [0, 1)

* enable cast

* fix tests

* add new module to doc
* Fix bug: input_channels was not truly reflecting the number of channels of x or y when given (x,y) as the input.

* Remove input_channels argument in UNet2 to avoid potential error caused by inconsistency between the two arguments input_channels and channels.

* Refactor all the similarity loss to inherient from SimilarityBase. SimilarityBase has a member variable called isInterpolated and it indicates whether the similarity loss class requires mask for the interpolated evaluation.

* 1. Fix bug: Should check whether inbounds_tag is None or not.
2. Add assertion to check the shape of image_A and image_B.

* Set correct shape for the inbounds_tag when images have multiple channels.

* Refactor the test script according to the SimilarityBase class.

* Refactor the test script according to the SimilarityBase class.

* Refactor the test script according to the SimilarityBase class.

* To allow using similarity measure defined by user.

* Keep ssd and ssd_only_interpolated for backward compatibility.

* Add itk interface for multi-modality registration task.
…arity meaure with isInterpolated set to True requires the inbounds_tag to be passed as one extra channel on image_A, otherwise the similarity measure will accept the two images with the same number of channels.
2.Add freesurfer affine evaluation script.
3.Move all the helper functions to helper.py.
4.Add a prepare script so that we process the image for Synthmorph once.
I don't think we meant to keep this check after switching to the "getattr" approach
Add GradICON paper and reference.
Use input image type instead of a predefined one. Also, refactoring
to be compliant with PEP8 (79 characters max length)
EHN: Change predefined image types for input images
This is the configuration we actually used in training
HastingsGreer and others added 19 commits December 8, 2023 10:57
* Update cpu-test-action.yml

* Update setup.cfg

* Update setup.cfg

* Update setup.cfg

* Update setup.cfg

* Update setup.cfg

* Update setup.cfg

* Update setup.cfg

* Update setup.cfg

* Update cpu-test-action.yml

* Update cpu-test-action.yml

* Update setup.cfg

* Update requirements.txt

* Update setup.cfg

* Update requirements.txt

* Update setup.cfg

* Update requirements.txt

* Update setup.cfg

* Update cpu-test-action.yml
Co-authored-by: Basar Demir <bdemir@biag-gpu6.cs.unc.edu>
Put a pointer to uniGradICON and multiGradICON into the ICON readme.
Correct link to uniGradICON and multiGradICON.
* Add squared lncc and mind-ssc losses

* fix cpu error and add indexing parameter for meshgrid in mind-ssc

---------

Co-authored-by: Basar Demir <bdemir@biag-gpu6.cs.unc.edu>
* add support for loss function masking

* Change masking strategy, update naming conventions, and fix bugs

---------

Co-authored-by: Basar Demir <bdemir@biag-gpu6.cs.unc.edu>
* Update cpu-test-action.yml

* Update cpu-test-action.yml
* Move ConstrICON code into ICON. WIP

So that I stop copy-pasting it everywhere.

* Create test_constricon.py

* Update test_constricon.py

* Update test_constricon.py

* Update test_2d_registration_train.py

* Update test_constricon.py

* Update test_constricon.py

* Update test_constricon.py

* Update constricon.py

* Update constricon.py

* Update medical_training.rst

* docs work

* Update test-action.yml

* Update test-action.yml

* maybe that helps

* Update README.md

* Update medical_training.rst

* docs

* tabs

* fix  tests

* doc fix

* Update medical_training.rst

* Update data.py

* work for presentation

* fixes

* a

* Update medical_training.rst

* Update medical_training.rst

* Update medical_training.rst

* Add files via upload

* Add files via upload

* updates for longleaf

* unicarl attempt

* README

* fix details

* a

* Update setup.cfg to ban naughty itk 6.0

* prepare for itk 6

* used hasattr wrong

* A

* itk regression fixed?

* datasets

* a

* is training

* Update register.py

* Update register.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants