Skip to content

Tomography tutorials#4

Merged
arthurmccray merged 49 commits intomainfrom
tomography_tutorials
Mar 25, 2026
Merged

Tomography tutorials#4
arthurmccray merged 49 commits intomainfrom
tomography_tutorials

Conversation

@arthurmccray
Copy link
Collaborator

@arthurmccray arthurmccray marked this pull request as draft February 19, 2026 17:37
@arthurmccray
Copy link
Collaborator Author

arthurmccray commented Feb 19, 2026

@cedriclim1 I created this as a place to make comments on the tutorials as I'm going through the main tomo refactor PR. So far:

  • no capitol letters in directories please (/tutorials/tomography/Notebooks -> tutorials/tomography/notebooks and scripts)
  • shouldn't phantom_dataset.ipynb be something like /tomography/notebooks/tomography_00_generate_phantom.ipynb or, if there are going to be additional simulation notebooks in the future (e.g. for hyperspectral, through-focal, etc.), tomography/notebooks/simulation_notebooks/basic_phantom.ipynb
  • also the notebook for making the dataset should save the tilt_series and tilt_angles with metadata in the filenames (tilt errors and shifts so you know what the ground truth is later)
  • fourier crop vs binning... don't want to confuse people.
  • let's also move hyperparameter_optimization.ipynb to diffractive_imaging
  • move device setting to the import cell
  • add opening the tensorboard logger to a cell
  • when running tomography_inr.reconstruct there are a lot of unnecessary prints and it's missing a progress bar with estimated time left
  • tutorials should demonstrate saving/loading of the full Tomography object if possible, but at least the reconstructed volume and final alignment values (just a np.save is fine)
  • maybe clarify somewhere that the only difference between TomographyConventional and Lite is that you don't have to explicitly set the pixelated object model. And related, should the TomographyLiteConv accept the tilt_series and tilt_angles as well? That way the user doesn't have to initialize the required TomographyPixDataset
  • For the first time defining each of the schedulers in tomography_02, i would specify the various kwargs that are being set by defaults (e.g. start_factor, eta_min, T_max) and add comments of the default values
  • for conventional recons, need to specify that it's doing SIRT and the kwarg for that
  • whats up with the factor of 42 scaling difference for SIRT vs INR? is it always 42? If so it should be handled internally, if not... what's causing it?
  • generally add titles to all of your plots so it's clear what the images are (e.g. "SIRT reconstruction used for pretraining")
  • for the INR recon, some discussion of what good "standard values" are for things like num_samples_per_ray would be helpful. Are you using 100 because you're reconstructing a 100^3 volume?

@cedriclim1
Copy link
Collaborator

@arthurmccray Ready for review again, I think I addressed most of your comments:

no capitol letters in directories please (/tutorials/tomography/Notebooks -> tutorials/tomography/notebooks and scripts)

Has been addressed, see f1ebced

shouldn't phantom_dataset.ipynb be something like /tomography/notebooks/tomography_00_generate_phantom.ipynb or, if there are going to be additional simulation notebooks in the future (e.g. for hyperspectral, through-focal, etc.), tomography/notebooks/simulation_notebooks/basic_phantom.ipynb

Has been addressed, see f1ebced

also the notebook for making the dataset should save the tilt_series and tilt_angles with metadata in the filenames (tilt errors and shifts so you know what the ground truth is later)

Has been addressed, data folder includes the tilt stack and tilt angles with 1 degree rotational misalignment.

fourier crop vs binning... don't want to confuse people.

Addressed in main PR

let's also move hyperparameter_optimization.ipynb to diffractive_imaging

Moved.

I'm getting lazy, I'm just going to address the ones that need a response probably.

whats up with the factor of 42 scaling difference for SIRT vs INR? is it always 42? If so it should be handled internally, if not... what's causing it?

Need to track this down, but it is a systematic ~42 scaling factor for different types of object sizes.

@cedriclim1 cedriclim1 marked this pull request as ready for review March 3, 2026 21:52
@arthurmccray
Copy link
Collaborator Author

arthurmccray commented Mar 23, 2026

Looks good! I ran all the tutorials and notebooks on nersc and everything ran. The things remaining to include at this point are just more instructions for users and filling out the readme.

For the notebooks:

  • More details on how to launch the tensorboard session. I did so from the VSCode "launch tensorboard" but I'm not sure how it was intended (since I was running on a remote I would probably need to set up port forwarding, and the in-line display didn't show up for me for some reason)
  • add the default "name" for the obj_constraints and dset_constraints or generally prevent that error.
  • Add explicitly to the tops of the notebooks, or to a README, that notebooks don't work for multiple nodes. Also preferably include what you would change if running on multiple GPUs from the notebook (if that's supported).
  • I'm still confused by the factor of 42 rescaling (where does it come from, are the units arbitrary for SIRT or is it always a factor of 42 difference?). You don't need to answer that now, but it's confusing that you have the note about it in the second notebook but then there's no explanation of why or how to account for it.
  • please add a little explanation as to why you have to do the torch.mp cleanup. (Could this be called at the end of each reconstruct? Oftentimes there's a ddp_cleanup() type method and this would live in that)
  • I would suggest fleshing out the comments and explanations across the board--you could do this yourself or just have an LLM do it and then review it. E.g. # warmup schedule for 10 epochs should also include that the tuples are (epoch_number, num_rays) (also if you specify it this way do you have to have len(num_samples_per_ray) == num_iter?)
  • Mention somewhere that the first epoch takes longer
  • please add some form of loss plotting for visualizing the loss curve (and ideally also the LRs if you track them)

For the scripts

  • Add remaining details/be explicit for how to create the environment and run the example scripts
    1. create conda environment with python (provide example with --prefix)
    2. install torch from here with cuda 3.10
    3. clone quantem, switch to appropriate branch if necessary, and pip install -e .
    4. optionally install jupyterlab/ipython
    5. clone the quantem-tutorials repo
    6. Edit the run.sh and run_job.sh scripts in the appropriate places (maybe add a couple comments in those files as well)
    7. keep the "how to run" info for how to get an interactive node (include that you need to specify --ntasks-per-node and similar)
    8. how to start tensorboard (through jupyter.nersc presumably, basically just a link to here)
  • Can the scripts print out where the log files are being written? That would be helpful
  • Change the run_job.sh to debug and a 30 or 10 min runtime for faster queuing

@cedriclim1 cedriclim1 requested review from gvarnavi and removed request for gvarnavi March 23, 2026 22:09
@cedriclim1
Copy link
Collaborator

cedriclim1 commented Mar 24, 2026

More details on how to launch the tensorboard session. I did so from the VSCode "launch tensorboard" but I'm not sure how it was intended (since I was running on a remote I would probably need to set up port forwarding, and the in-line display didn't show up for me for some reason)

Added descriptions to both the top of tomography_01 and tomography_02 tutorial notebooks. It seems that the inline Tensorboard stuff is not working anymore. Probably an ipykernel thing...Anyways, I put some helpful tips for launching their own tensorboard sessions through terminal.

add the default "name" for the obj_constraints and dset_constraints or generally prevent that error.

Fixed! I think I just forgot to do it for the pretrainig example, should be updated now.

Add explicitly to the tops of the notebooks, or to a README, that notebooks don't work for multiple nodes. Also preferably include what you would change if running on multiple GPUs from the notebook (if that's supported).

There's now a new README.md file for Notebooks here explicitly saying that the notebooks are not good for multiple node stuff since torchrun is required.

I'm still confused by the factor of 42 rescaling (where does it come from, are the units arbitrary for SIRT or is it always a factor of 42 difference?). You don't need to answer that now, but it's confusing that you have the note about it in the second notebook but then there's no explanation of why or how to account for it.

This is now handled in the backend a long-time ago, the comment was just left in the tutorial.

please add a little explanation as to why you have to do the torch.mp cleanup. (Could this be called at the end of each reconstruct? Oftentimes there's a ddp_cleanup() type method and this would live in that)

Described in tomography_02.

I would suggest fleshing out the comments and explanations across the board--you could do this yourself or just have an LLM do it and then review it. E.g. # warmup schedule for 10 epochs should also include that the tuples are (epoch_number, num_rays) (also if you specify it this way do you have to have len(num_samples_per_ray) == num_iter?)

Added a few more explanations, although let me know if they're still not good enough.

Mention somewhere that the first epoch takes longer

Done

please add some form of loss plotting for visualizing the loss curve (and ideally also the LRs if you track them)

Done, see b0171fd and 5923624.

Add remaining details/be explicit for how to create the environment and run the example scripts

README.md for HPC is now updated using your steps: 8c9ba61 and here.

Can the scripts print out where the log files are being written? That would be helpful

Added in here f4835b7.

Change the run_job.sh to debug and a 30 or 10 min runtime for faster queuing

Made more explicit what to change in run_job.sh and run.sh including if they must change from debug if they want a longer job. For now it is in the debug queue as a default.

@arthurmccray arthurmccray merged commit c5c4c60 into main Mar 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants