Welcome! This guide teaches you how to contribute to open source using a fork-based workflow. We use quantem as an example, but these patterns work for any project. The goal: you'll never lose work, your collaborators will always know what you're doing, and your code history will tell a coherent story. Contributing to open source is one of the most rewarding ways to grow as a scientist and developer. Writing code to a high standard can feel difficult and overwhelming at first, but it is a great investment in your career. The Git and GitHub skills you build here will serve you for a lifetime.
If you have any questions or find instructions unclear, reach out to @bobleesj.
| Case | When to use | PR target |
|---|---|---|
| 1 | Initial/private algorithm; colleague wants to test and contribute | <username>/quantem:<branch> |
| 2 | Large features, multi-person collaboration | electronmicroscopy/quantem:<branch> |
| 3 | Typos, small bugs, documentation | electronmicroscopy/quantem:dev |
- How do I set up my computer?
- Making your first contribution
- GitHub issues and pull requests
- Coding standards
- Troubleshooting
- Appendix A: Glossary
- Appendix B: Acronyms
- Appendix C: SSH for GitHub
- Appendix D: Docstrings, tests, error messages, coordinates, and type hints
Before contributing, we invite you to read through:
- The naming philosophy behind clone, fork, origin, upstream, fetch, merge, and pull (see Appendix A).
- The coding standards on docstrings, tests, error messages, coordinates, and type hints (see Appendix D).
- How do I write great pull request and issue titles?
- Install Git: https://git-scm.com/
- Windows user? Install Git Bash: https://gitforwindows.org/
- Set up SSH for GitHub so you don't have to enter your password every time (see Appendix C).
- Install GitHub CLI: https://cli.github.com/
- Go to https://github.com/electronmicroscopy/quantem and click Fork.
- Clone your fork:
git clone https://github.com/<your-username>/quantem.git cd quantem
- Add the upstream remote:
git remote add upstream https://github.com/electronmicroscopy/quantem.git
- Verify remotes with
git remote -v. You should see bothorigin(your fork) andupstream(the org repo).
For installing quantem in development mode with uv, setting up pre-commit hooks, and managing dependencies, see the quantem CONTRIBUTORS.md.
There are three common cases when contributing to quantem:
- Case 1: Prototyping on your fork - initial/private algorithm development; colleague helps test
- Case 2: Major feature development - larger features requiring collaboration on upstream
- Case 3: Quick fixes - typos, small bugs, small features, documentation updates
Assume Will is prototyping a new alignment algorithm on his personal fork. The code isn't ready for the main repository yet, but Will asks Colin for help testing and improving it.
Here's an overview. Follow the steps below first, then use this diagram as a reference:
- Will checks out his local
devbranch and pulls the latest from upstream (after cloning, the localdevbranch has the same commit history as thedevbranch on his fork):git checkout dev git pull upstream dev
- Will creates a local branch off of the latest commits from
dev:git checkout -b align
- Will makes changes, commits, and uploads the local branch to his fork:
git add <files> git commit -m "Add initial alignment algorithm" git push -u origin align
Note: The
-uflag sets upstream tracking. You only need it the first time you push a new branch. After that,git pushis sufficient.
Will continues iterating on wwmills/quantem:align. Now Colin wants to contribute and test Will's code. How does Colin contribute?
- Colin adds Will's fork URL so he can fetch Will's latest commits:
git remote add will https://github.com/wwmills/quantem.git
- Colin fetches Will's commits and checks out Will's branch:
git fetch will git checkout will/align
- Colin creates a local branch off of Will's
alignbranch:git checkout -b align-subpixel
- Colin makes changes and commits:
git add <files> git commit -m "Add subpixel alignment support"
- Colin uploads his local
align-subpixelbranch to Colin's fork (https://github.com/cophus/quantem):git push -u origin align-subpixel
- Colin visits
https://github.com/wwmills/quantemand clicks the green Compare & pull request button to create a PR fromcophus/quantem:align-subpixeltowwmills/quantem:align. Colin follows the guidelines in Making the pull request review process effective.
- Will reviews and merges Colin's PR into
wwmills/quantem:align. - When the feature is ready, Will creates a PR from
wwmills/quantem:aligntoelectronmicroscopy/quantem:dev. - A maintainer reviews and merges it. Colin's commits are preserved in the contribution history.
Note: If the PR to
quantem/devwould be too large (e.g., thousands of lines), consider using Case 2 instead. We don't want to overwhelm core reviewers with massive PRs. Case 2 creates a feature branch on upstream where multiple people can contribute iteratively with smaller, reviewable PRs before merging toquantem/dev.
Here we use the example of Bob. Bob is building drift correction for quantem using PyTorch to speed up computation. This is an actual workflow Bob uses to collaborate with Will. The feature is too large for a single PR and requires multiple iterations before it's ready for quantem/dev. How does Bob collaborate with Will so they can build and contribute collectively?
Here's an overview. Follow the steps below first, then use this diagram as a reference:
- Go to https://github.com/electronmicroscopy/quantem.
- Click the branch dropdown (shows
quantem/dev), type the new branch name (e.g.,drift-torch), and click Create branch: drift-torch from quantem/dev.
When multiple people contribute to a branch on upstream, each person creates local branches and PRs iteratively.
- Bob fetches the branch and pulls the latest:
git fetch upstream git checkout drift-torch git pull upstream drift-torch
- Bob creates a branch named after the specific feature:
git checkout -b drift-torch-rigid
- Bob makes changes and commits:
git add <files> git commit -m "Add rigid transformation for drift correction"
- Bob uploads his local
drift-torch-rigidbranch to his fork (origin):git push -u origin drift-torch-rigid
- Bob visits https://github.com/electronmicroscopy/quantem and clicks the green Compare & pull request button to create a PR from
bobleesj/quantem:drift-torch-rigidtoelectronmicroscopy/quantem:drift-torch(notdev). Bob follows the guidelines in Making the pull request review process effective.
For Bob's next contribution, Bob does not branch off from drift-torch-rigid. Instead, Bob starts from the latest upstream/drift-torch branch. It is the source of truth and contains merged commits from all contributors:
- Bob switches to the branch and pulls the latest:
git checkout drift-torch git pull upstream drift-torch
- Bob creates a new branch for the next feature:
git checkout -b drift-torch-affine
- Bob makes changes, commits, and uploads his local
drift-torch-affinebranch to his fork:git add <files> git commit -m "Add affine transformation for drift correction" git push -u origin drift-torch-affine
- Bob visits https://github.com/electronmicroscopy/quantem and clicks the green Compare & pull request button to create a PR from
bobleesj/quantem:drift-torch-affinetoelectronmicroscopy/quantem:drift-torch. Bob follows the guidelines in Making the pull request review process effective.
Will wants to contribute to the same branch. Will sees drift-torch, which contains the latest commits merged from drift-torch-rigid and drift-torch-affine. Will has experimental data and tests the code across multiple dimensions, debugging and fixing issues from previous commits:
- Will fetches the branch and pulls the latest:
git fetch upstream git checkout drift-torch git pull upstream drift-torch
- Will creates a branch for adding tests:
git checkout -b drift-torch-test
- Will makes changes, commits, and uploads his local
drift-torch-testbranch to his fork:git add <files> git commit -m "Add unit tests for drift correction" git push -u origin drift-torch-test
- Will visits https://github.com/electronmicroscopy/quantem and clicks the green Compare & pull request button to create a PR from
wwmills/quantem:drift-torch-testtoelectronmicroscopy/quantem:drift-torch. Will follows the guidelines in Making the pull request review process effective.
Will continues with drift-torch-validate, drift-torch-large-images, etc. Both Bob and Will can contribute simultaneously. The feature-based naming (-rigid, -test, -validate) keeps everyone's work organized and descriptive.
Tip: To avoid merge conflicts, communicate with your team before starting work. Ideally, two people should not edit the same file at the same time. A quick message ("I'm working on
drift.py") helps prevent conflicts.
When the feature is complete and tested, either Bob or Will can create a PR from electronmicroscopy/quantem:drift-torch to electronmicroscopy/quantem:dev following the pull request guidelines. A maintainer reviews and merges it.
You spotted a typo in the README. The fix is small enough that it can go directly to quantem/dev as a single PR. How do you contribute it?
Here's an overview. Follow the steps below first, then use this diagram as a reference:
- Switch to your local
devbranch (it already exists after cloning) and pull the latest commits fromupstream/dev:git checkout dev git pull upstream dev
- Create a local branch called
fix-readme-typo:git checkout -b fix-readme-typo
- Edit
README.mdusing your favorite IDE. - Stage and commit the changes:
git add README.md git commit -m "Fix typo in README installation section" - Upload your local
fix-readme-typobranch to your fork (origin):git push -u origin fix-readme-typo
- Verify the branch exists on GitHub by visiting
https://github.com/<your-username>/quantem/tree/fix-readme-typo. - Go to https://github.com/electronmicroscopy/quantem and click the green Compare & pull request button to create a PR from
<your-username>/quantem:fix-readme-typotoelectronmicroscopy/quantem:dev. Follow the guidelines in Making the pull request review process effective.
GitHub issues are where we discuss what to build, report bugs, and converge on design decisions before writing code. Not everyone can join the bi-weekly dev meeting, but everyone can participate in a GitHub issue. It's a democratized platform where anyone can propose ideas, ask questions, and collaborate regardless of time zone or schedule. Every issue has a permanent URL that anyone can reference months or years later to understand why a feature was designed a certain way, who contributed to the discussion, and what alternatives were considered.
We encourage you to open issues freely. You don't need to have a solution to start a conversation. Before implementing a new feature, open an issue first. This gives the team a chance to discuss the problem, explore potential solutions, and align on an approach. It also engages potential users early. See Issue #149 for an example where the team discussed the 5D-STEM dataset design before implementation.
Tag people with @username to send direct notifications.
An issue can be divided into two sections:
### Problem
[Describe the problem or feature request]
### Proposed solutions
[Describe possible approaches]
If you're reporting a bug, you don't have to propose a solution. Reporting the problem is valuable on its own. Attach screenshots, error messages, and metadata (Python version, OS, package versions) as needed to help with debugging. If you're proposing a new feature, put effort into the proposed solutions section. Include screenshots, API design, and use cases so that potential reviewers can read it and brainstorm with you.
Examples:
- Issue #136 - bug report (Python 3.13 type alias compatibility)
- Issue #138 - feature request (
quantem.__version__support) - Issue #149 - feature design discussion (pre-implementation alignment)
- Issue #105 - architecture discussion (whether to add Widget module)
When a PR addresses an issue, use Closes #<issue-number> in the PR body. Once merged, the issue will automatically close (see PR #151).
-
Start as a Draft PR while work is in progress.
-
Write a short, descriptive title (see How do I write great pull request and issue titles?).
-
In the body, showcase the problem we are solving. Attach screenshots, plots, and design visuals. Our reviewers are colleagues here to see how we use Python to solve a scientific problem.
Every PR has a public URL. The more accessible we make it through visuals and clear writing, the more people can give us feedback without running anything. More reviewers means more input, more potential users, and more impact of our code. I have found a PR can be effective with the following yet minimal framework
### What problem does this PR address? Closes #<issue-number> [Describe the problem this PR solves. Focus on inputs/outputs. Attach screenshots, plots, and before/after comparisons.] [Show the function signatures, class interfaces, and how a scientist would use this in a notebook. The API is what a scientist actually types. Get this right first.] ### What should the reviewer(s) do? [Explain how to test, what to look for, or any dependencies on other PRs]
Note: Writing represents our internal state as the author. The goal is to externalize our reasoning so the team can make decisions collectively. We write as little as required, but as much as needed. A PR is self-serving documentation for developers and debugging along with commits. Reviewing takes time, and writing effectively respects the reviewer's time so we can prioritize advancing science.
- Before tagging a reviewer, go to Files changed and review our own code. We'll catch mistakes and save everyone time. When ready, tag the reviewer and say "Ready for review."
- After receiving feedback, optionally turn the reviewer's comments into a checklist and check off items. This respects the reviewer's input, acknowledges their feedback, and serves as a to-do list for everyone. If the list is too long or beyond scope, turn it into a GitHub issue for later tracking. After making changes, tag the reviewer again with "Ready for review." See PR #146 for an example where George and Colin provided design feedback, and this comment for an example checklist.
- Leave inline comments on your own PR to guide the reviewer. Don't make the reviewer guess why you made a decision. Add comments on your own diff pointing out non-obvious choices, trade-offs, or areas where you want specific feedback. This saves a round trip and shows you've thought it through.
For reviewers: We invite you to review others' work. It's one of the fastest ways to learn the codebase. Reviewing maintains our standards and ensures that code is understood by more than one person. When only the author understands the code, that's a weak link. A thoughtful review reduces those weak links, catches bugs before they reach a scientist's notebook, and helps the author grow. Respect the author's time by being specific, and respect your own time by not re-reviewing things pre-commit already handles.
- Understand the nature of the problem and how it benefits scientists and the community first. Not the design, not the code, not the technology. If not, the author should open an issue and discuss what problem the PR solves before adding more commits.
- Check out the branch locally and run the tests.
- Focus on correctness, API design, test coverage, and docstrings. Don't nitpick formatting if pre-commit handles it.
- Be specific. "This will break if scan_shape has an odd dimension" is useful. "Needs work" is not.
- Approve when it's good enough, not perfect. If there are remaining concerns, discuss whether another iteration is needed or encourage the author to create an issue to address potential bugs and improvements down the road.
- If you request changes, say what "done" means to you as a reviewer.
Examples:
We have a common goal of advancing science. When reviewers disagree, you may adopt the following steps:
- Discuss the code, not the coder. "This approach has O(n^2) complexity" is fine. "You always write slow code" is not.
- Try to resolve it between yourselves first. Use inline comments on the PR. If needed, follow up with a private message or a quick call.
- If stuck, bring in a third person. A fresh perspective breaks deadlocks.
- If still unresolved, present both approaches at the bi-weekly dev meeting. The higher-impact idea wins, not the louder voice or seniority.
- Once decided, commit fully. No passive resistance. Code is always evolving. The decision may not be the most optimal one right now, but it can change. If you have a strong reason to revisit it later, bring it up again.
The best way to avoid disagreements at the PR level is to not have them in the first place. This is why we write issues and communicate our design and problems clearly before writing code. A PR should be an implementation of a plan the team already agrees on.
Why do titles matter? Our goal is to advance science, not spend time debugging. Every hour someone spends tracing a regression through vague commit messages is an hour not spent on research. PR titles end up in git log, release notes, and blame annotations. When someone runs git log --oneline a year from now to find where a bug was introduced, "Infrastructure changes" is a dead end. "Fix hot-pixel filter zeroing valid data on Arina datasets" points them straight to the answer. The time you invest writing a clear title once saves the entire team time forever.
This is part of a broader philosophy: once your code works in your fork and you're past the prototype stage, it's often worth spending time upfront on clear communication and good infrastructure. Prototyping is fast and messy by design. But when code moves to upstream, it becomes shared property. Clear titles, descriptions, and commit messages let the group make decisions collectively and trace problems back to their origin.
A PR title is the first thing a reviewer reads and the last thing a debugger searches.
Format: Start with a verb. Cover what the change is and answer "so what?" (why it matters).
# Bad - what file, no context
Updated vector.py
Infrastructure changes
Bug fix
# Good - says what, but not why it matters
Add cell-level indexing to Vector
Fix hot-pixel filter
Remove deprecated parameter
# Great - what changed and so what
Fix hot-pixel filter zeroing valid data on Arina SNSF datasets
Rename rotation_angle to rotation_angle_deg for explicit degree input in direct ptycho
Issue titles follow the same pattern. Every title must cover two things: (1) what's happening (bug) or what needs to be done (feature) and (2) so what?
# Good
Problem with dataset
Question about API
# Great
Dataset4dstem.fourier_resample crashes on odd scan dimensions, blocking 3D reconstruction
Add quantem.__version__ so users can report their version in bug reports
GitHub CLI (gh) provides shortcuts for common tasks. See GitHub CLI installation if not already installed.
List open pull requests:
gh pr listCheck out a specific PR by number:
gh pr checkout 146This creates a local branch with the PR's changes so you can test or review the code.
You may also use gh issue list and gh issue view <number> to view issues from the command line. For more commands, see the GitHub CLI documentation.
We encourage everyone to contribute early and often. Everyone on this team is balancing research, coursework, and life. The time someone spends reviewing our code, debugging our error message, or deciphering our commit history is time they could be more spent on science. Guidelines are here to respect our collective time:
We focus on the code, not the person. We keep feedback constructive and stay neutral in PRs and issues. We provide feedback to help each other improve.
We stay accountable and responsive. A reviewer waiting on your reply can't move forward. If you need more time, say so. "I'll address this by Thursday" is better than silence. Fast iteration keeps everyone unblocked.
We maintain balanced code quality. We follow NumPy docstring conventions and PEP 8 standards, and provide example notebooks and tests where necessary.
We ship, but not broken code. Prototype freely in your fork or a branch on upstream. Unoptimized code is fine. Broken code stays in the fork until it's fixed.
We align before we build. Before taking on a large PR, bring it up at the bi-weekly quantem meeting or open a GitHub issue. Ten minutes of discussion can save weeks of wasted work on something the team doesn't need or would design differently.
Don't commit directly to your local dev branch. Those commits won't match upstream/dev, and you'll need to reset. Instead, always branch: git pull upstream dev then git checkout -b <branch-name>.
Don't stage everything blindly. Use git add <specific-files> instead of git add . to avoid accidentally committing .env files, API keys, or large datasets. Before submitting a PR, check Files changed to confirm only intended files are included.
Don't force push to shared branches. In Case 2, multiple people contribute to upstream/drift-torch. Force pushing can erase a teammate's work. Use PRs to merge changes instead.
Don't let upstream feature branches live too long. The longer a branch diverges from dev, the more merge conflicts accumulate and the harder the final merge becomes. If possible, aim to merge feature branches into dev within a few weeks, not months.
Don't name branches too broadly. imaging is an entire module, not a feature. Name branches after the specific feature: imaging-cellview, imaging-roi-export, etc. Use clear, consistent naming as shown in Case 2 (drift-torch-rigid, drift-torch-affine, drift-torch-test).
Don't start a large PR without alignment. See "Align before building" in Guidelines.
Don't mix unrelated changes in one PR. One PR, one purpose: a feature, a refactor, or a bug fix. When unrelated changes get bundled together, reviewers spend more time untangling what changed than evaluating whether it's correct.
Don't leave review comments unanswered. Respond to each comment saying whether it's been addressed or is out of scope. Unanswered comments leave reviewers guessing and slow down the next round.
Don't over-engineer tests. Every test a human has to review, maintain, and debug costs real time. More lines of code is not better. Respect human time above anything else. Write tests that cover how scientists actually use the code, not every possible edge case. See D2. How do I write effective tests? for details.
Don't skip the Examples section in docstrings. Most public functions should have at least one usage example. See D1. How do I write effective docstrings? for details.
Don't write cryptic error messages. Error messages should guide the user on what to do next without digging into the entire codebase. See D3. How do I write great error messages? for details.
Don't create your own coordinate system. Use the (row, col) convention in quantem. See D4. How do I represent coordinates in NumPy, Matplotlib, and quantem? for details.
Don't use legacy type hints. Optional, List, Dict, Tuple, Union, and Any from typing are no longer needed in Python 3.11+. See D5. How do I use type hints? for details.
Don't create coding conventions. This includes comments, line spacing, and formatting. We follow NumPy docstring conventions and PEP 8 standards.
Prototyping is fast and messy by design, and you may even submit scientific code and publish a paper with it. But we make it open source to increase impact across the community. We encourage you to experiment freely in your fork. But the moment code hits upstream, it becomes shared responsibility. Everyone's time is extremely valuable. Putting the effort into clear, well-structured code respects every scientist who will read, review, and build on it. The code represents our standards, and we want those standards to help scientists. See Appendix D: Coding standards for the details.
I accidentally committed to dev instead of a new branch
If you haven't pushed yet:
git reset --soft HEAD~1This undoes the commit but keeps your changes staged. Then create your new branch and recommit:
git checkout -b my-feature
git commit -m "Your message"If you already pushed to your fork's dev, you'll need to reset it:
git checkout dev
git reset --hard upstream/dev
git push origin dev --forceThen create your new branch and recommit your changes.
I have merge conflicts
Merge conflicts happen when two people edit the same lines. Git marks conflicts like this:
<<<<<<< HEAD
your changes
=======
their changes
>>>>>>> branch-name
To resolve:
-
Open the conflicted file and look for
<<<<<<<markers. -
Decide which version to keep (or combine both).
-
Remove the conflict markers (
<<<<<<<,=======,>>>>>>>). -
Stage and commit:
git add <file> git commit -m "Resolve merge conflict in <file>"
Prevention tip: Communicate with your team. If two people need to edit the same file, coordinate timing or split the work into different sections.
I made changes but Git says "nothing to commit"
Common causes:
- You're in the wrong directory. Run
pwdand make sure you're inside the repository. - Changes aren't saved. Save your files in your editor.
- Files are gitignored. Check
.gitignoreto see if your file pattern is excluded. - You already committed. Run
git logto see recent commits.
I want to undo my last commit
If you haven't pushed yet:
git reset HEAD~1 --soft # Keeps changes staged
git reset HEAD~1 --mixed # Keeps changes unstaged (default)
git reset HEAD~1 --hard # Discards changes entirely (careful!)If you already pushed, it's safer to make a new commit that reverses the changes:
git revert HEAD # Creates a new commit undoing the last one
git push origin <branch>Git says "Your branch is behind" when I try to push
The upstream branch has new commits. Pull the latest first:
git pull upstream <branch>If there are conflicts, resolve them (see merge conflicts above), then push.
What is the difference between remote and local?
Remote is a repository hosted on GitHub (under your account or an organization). Local is the copy on your computer.
Example: https://github.com/electronmicroscopy/quantem is remote. The folder on your computer after git clone is local.
What is the difference between fork and clone?
Fork creates a personal copy on GitHub under your account. Clone downloads a repository to your local machine.
Example: Clicking "Fork" on GitHub creates https://github.com/bobleesj/quantem. Running git clone downloads it to your computer.
What is the difference between origin and upstream?
Origin points to your fork. Upstream points to the original repository.
Example: origin = https://github.com/bobleesj/quantem. upstream = https://github.com/electronmicroscopy/quantem.
What is the difference between branch and fork?
A branch is a parallel version of code within the same repository. A fork is a copy of the entire repository under a different account.
Example: quantem/dev and quantem/drift are branches. electronmicroscopy/quantem and bobleesj/quantem are forks.
What is the difference between pull and fetch?
Fetch downloads changes from a remote but doesn't apply them. Pull fetches and merges the changes into your current branch.
Example: git fetch upstream downloads updates. git pull upstream dev downloads and merges into your local dev.
What is the difference between git add and git commit?
git add stages changes (prepares them to be committed). git commit saves the staged changes to your local repository.
Example: git add file.py stages the file. git commit -m "Add feature" saves it with a message.
What is the difference between git pull and git push?
git pull downloads changes from a remote and merges them into your local branch. git push uploads your local commits to a remote.
Example: git pull upstream dev gets updates from upstream. git push origin fix-typo sends your commits to your fork.
What is a PR (pull request)?
A request to merge your changes from one branch into another.
Example: PR from bobleesj/quantem:fix-typo to electronmicroscopy/quantem:dev.
What is detached HEAD state?
When you check out a remote-tracking branch directly (like will/align), Git puts you in "detached HEAD" state. This means you're not on a local branch. You're viewing a snapshot of the remote.
Example from Case 1: When Colin runs git checkout will/align, he's in detached HEAD state. Any commits made here won't belong to a branch and could be lost. That's why Colin creates a local branch with git checkout -b align-subpixel. This saves his work to a proper branch.
How do I temporarily save uncommitted changes (git stash)?
When you have uncommitted changes and need to switch branches, Git will block you to prevent losing your work. Use git stash to temporarily save your changes, switch branches, and then restore them later with git stash pop.
git stash # Save uncommitted changes
git checkout dev # Switch to another branch
# ... do other work ...
git checkout my-branch
git stash pop # Restore your saved changesWhat is git push --force?
A normal git push fails if the remote branch has commits your local branch doesn't have. Git protects you from accidentally overwriting work. git push --force tells Git: "I know the histories don't match. Replace the remote with my local version anyway."
Force push is safe for your own branches on your fork (you're only affecting yourself), but dangerous for shared branches on upstream (you could delete teammates' commits). Use PRs instead of force pushing to shared branches.
- PR - Pull Request
- SSH - Secure Shell
- CLI - Command Line Interface (e.g., Terminal, Warp, Git Bash)
- IDE - Integrated Development Environment (e.g., VS Code, PyCharm)
- API - Application Programming Interface
- UI - User Interface
- UX - User Experience
- GUI - Graphical User Interface
- CI/CD - Continuous Integration / Continuous Deployment (automated testing and deployment in the cloud, e.g., GitHub Actions)
SSH allows you to push and pull without entering your password every time.
In your terminal, run the following commands to generate a new SSH key pair. Replace <email@example.com> with your email address.
mkdir -p ~/.ssh
cd ~/.ssh
ssh-keygen -o -t rsa -C "<email@example.com>"
cat id_rsa.pub- Visit https://github.com/settings/keys.
- Click New SSH key.
- Set the Title as
<your-computer-name>-key. - Under Key, copy and paste the content of the
id_rsa.pubfile. It should start withssh-rsaand end with your email address. - Click Add SSH key.
- Done!
Why write docstrings? They appear in three places: (1) VS Code and PyCharm show them when you hover over a function, so scientists get instant help without leaving their editor, (2) Sphinx pulls them into the official documentation automatically (see Show2D API docs for a live example), and (3) shift-tab on Jupyter notebook. One docstring, three audiences, zero extra work.
We use NumPy-style docstrings. A docstring should answer two questions, in this order: (1) Why does this code exist? Every function solves a problem. State the problem first. (2) How does it work and why is it designed this way? Explain the approach and key design choices so contributors understand the reasoning, not just the interface. The reader is a scientist, not a code reviewer.
# Wrong - describes implementation
def roi_annular(self, inner_radius=None, outer_radius=None):
"""Create a boolean mask by computing pixel distances from the
center using torch.cdist, then applying inner/outer thresholds."""
# Right - states the problem, then explains the design
def roi_annular(self, inner_radius=None, outer_radius=None):
"""Set ROI mode to annular for ADF/HAADF imaging.
The annular ROI integrates over a donut-shaped region in the
diffraction pattern. Use small inner radii for ADF, larger
inner radii for HAADF. The virtual image updates immediately."""When in doubt: "Would a scientist in the lab care about this sentence?" If no, it belongs in a code comment, not the docstring.
Docstrings support mathematical notation via reStructuredText. Use :math:`E = mc^2` for inline math and .. math:: blocks for display equations. This is especially useful for functions that implement known formulas, so the reader can verify the implementation against the literature.
def relativistic_wavelength(voltage_kv: float) -> float:
"""Compute the relativistic de Broglie wavelength of an electron
At typical TEM accelerating voltages (80-300 kV), relativistic
effects shorten the wavelength by several percent. This function
uses the relativistic form:
.. math::
\lambda = \frac{h}{\sqrt{2 m_0 e V \left(1 + \frac{eV}{2 m_0 c^2}\right)}}
Parameters
----------
voltage_kv : float
Accelerating voltage in kilovolts (e.g., 200 for a 200 kV TEM).
Returns
-------
float
Wavelength in angstroms.
Examples
--------
>>> relativistic_wavelength(200)
0.02508
"""Short one-line summary (imperative mood, no period)
Longer description if needed. State the problem this
function solves, then explain the design. Supports
math notation: :math:`k = 2\pi / \lambda` for inline.
Parameters
----------
<name> : <type>
What it represents and what units it uses.
<name> : <type>, optional
What it controls. Default is <value>.
Returns
-------
<type>
What the caller gets back.
Examples
--------
>>> <most common usage>
Complete example (non-widget function from quantem.core):
def fourier_resample(
self,
new_shape: tuple[int, int],
method: str = "lanczos",
) -> Self:
"""Rescale scan dimensions without interpolation artifacts in real space
Resampling in Fourier space avoids the ringing and aliasing that
real-space interpolation introduces at sharp edges. This is the
preferred way to change scan resolution after acquisition.
Parameters
----------
new_shape : tuple[int, int]
Target (rows, cols) for the scan dimensions.
method : str, optional
Interpolation kernel. Default is "lanczos".
Returns
-------
Self
A new Dataset with the resampled scan dimensions.
Examples
--------
>>> ds = Dataset4dstem("scan.h5")
>>> ds_resampled = ds.fourier_resample((128, 128))
>>> ds_resampled.scan_shape
(128, 128)
"""Key rules: name : type with spaces around colon. optional for parameters with defaults. >>> prompt in Examples so Sphinx renders them correctly. 2-3 examples, most common use case first.
- Don't skip
Examples. Even a one-liner is better than nothing. - Don't document
_privatemethods. They don't appear in docs. - Don't repeat the type hint.
radius : float→ "Radius of the circle in pixels", not "A float specifying the radius..." - Don't describe implementation. "Compute the virtual image for the current ROI", not "sum pixels in the ROI mask using torch.sum."
Why write tests? Tests mimic how a scientist actually uses the code. Every test is a real scenario: construct the object, call the method, check the result. This does two things: (1) it catches regressions before they reach a scientist's notebook, and (2) it shows other contributors how the code is meant to be used. Docstrings explain why a function exists, tests show how it runs. GitHub Actions runs pytest on every PR, so a failing test blocks the merge before it can affect anyone else.
Test the way a scientist actually uses the code. If you wrote fourier_resample, the most important tests are: resample to a smaller shape, resample to a larger shape, check that the output type is correct. That's how scientists will call it. You don't need to immediately test every edge case, extreme array size, or input type combination. Those tests can come later if a real bug surfaces.
Don't over-engineer tests. Over-tested code is brittle - every refactor breaks dozens of tests that were testing implementation details nobody cares about. Write the 5 tests that cover 95% of real usage, not the 50 tests that make the code impossible to change. Use your judgment on what matters.
The goal is simple: set up the input, call the function, check the expected output.
def test_to_numpy_from_torch():
tensor = torch.tensor([1.0, 2.0, 3.0])
result = to_numpy(tensor)
assert isinstance(result, np.ndarray)
assert np.allclose(result, [1.0, 2.0, 3.0])
def test_fourier_resample_preserves_scan_shape():
ds = Dataset4dstem("scan.h5")
ds_resampled = ds.fourier_resample((64, 64))
assert ds_resampled.scan_shape == (64, 64)Why do error messages matter? Scientists working in a Jupyter notebook don't want to stop, open the API documentation, and search for what went wrong. They want to look at the traceback, understand what happened, and fix it on the spot. A great error message is a guide: it tells the user exactly what they did wrong, shows them the actual value that caused the problem, and tells them how to correct it. The goal is that the scientist never has to leave their terminal or notebook to resolve the issue.
A scientist hits an error at 11pm before a deadline. If the message says "Invalid scan shape", they're stuck. If it says "You passed a 2D array with shape (256, 256), but Dataset4dstem expects a 4D array. Pass scan_shape=(rows, cols) to reshape it", they fix it in 30 seconds and move on. Good error messages turn a support request into a self-service fix.
A great error message has two parts: (1) what the user did wrong and (2) a potential next step that fixes it. Include the actual value that caused the error so the user doesn't have to guess.
# Wrong - tells the user nothing
raise ValueError("Invalid scan shape")
# Wrong - says what's wrong but not how to fix it
raise ValueError(f"Got shape {data.shape}, expected 4D array")
# Right - talks directly to the user and tells them what to do
raise ValueError(
f"You passed a {data.ndim}D array with shape {data.shape}, "
f"but Dataset4dstem expects a 4D array "
f"(scan_rows, scan_cols, det_rows, det_cols). "
f"If your data is a flattened scan, pass scan_shape=(rows, cols) "
f"to Dataset4dstem() to reshape it."
)
# Right - shows what they entered and lists valid options
raise ValueError(
f"You entered method='{method}', which is not supported. "
f"Choose from: {', '.join(sorted(VALID_METHODS))}."
)Why (row, col)? Our coordinate convention is grounded in the physical geometry of the microscope. In scanning electron microscopy, the beam scans left-to-right (fast scan direction, columns) and top-to-bottom (slow scan direction, rows). This is exactly how NumPy lays out a 2D array: array[row, col], where row is the slow axis and col is the fast axis. Our API uses (row, col) because it matches both the physics of the scan and the indexing you already use in NumPy. If we used (x, y) instead, you'd have to mentally swap axes every time you go between quantem and your arrays. That swap is exactly the kind of silent bug that produces wrong results without raising an error.
All user-facing coordinates use (row, col). The first value is always the row (slow scan, top-to-bottom), the second is the column (fast scan, left-to-right).
This applies everywhere:
- Function/method parameters:
center=(row, col), notcenter=(x, y). - Return values and dicts:
{"row": r, "col": c}, not{"x": x, "y": y}. - Display and print output: coordinates shown as
(row, col)in readouts,summary(), and error messages. - Variable names:
pos_row/pos_col,roi_row/roi_col, notpos_x/pos_y.
Internal drawing code (canvas pixel positions, matplotlib plt.scatter(x, y), DOM events) can use x/y since those are screen coordinates, not image coordinates. But anything exposed to the user must use (row, col).
A common bug: you plot a coordinate with plt.scatter(row, col) and the point appears in the wrong place. Matplotlib expects (x, y), which is (col, row) in our convention. The table below shows the mapping:
| System | First axis | Second axis | Notes |
|---|---|---|---|
| Our API | row (vertical) |
col (horizontal) |
User-facing, everywhere |
| NumPy | array[row, col] |
Same: row = axis 0, col = axis 1 |
|
| Matplotlib | y (vertical) |
x (horizontal) |
plt.scatter(col, row), axes are swapped |
Why type hints? Type hints let you understand the shape and structure of what a function expects and returns without reading the body. They catch bugs before runtime. When you type dataset.fourier_resample( in VS Code, type hints tell the editor what arguments are expected, so it can autocomplete and flag mistakes immediately.
They also keep variable names clean. Without type hints, you end up encoding the type into the name: stem_dataset_np_array, scan_shape_tuple. With type hints, you can just write data or scan_shape because hovering in VS Code already shows you it's an np.ndarray or tuple[int, int]. The type information lives in the signature, not cluttering the name.
We target Python 3.11+. Use built-in generics and X | Y union syntax. Do not import Optional, List, Dict, Tuple, Union, or Any from typing. These are unnecessary since Python 3.10+.
# Wrong - legacy typing imports
from typing import Optional, List, Dict, Tuple, Union, Any
def load(path: Optional[str] = None) -> List[Dict[str, Any]]:
...
def process(data: Union[np.ndarray, torch.Tensor]) -> Tuple[int, int]:
...# Right - built-in syntax (Python 3.11+)
def load(path: str | None = None) -> list[dict[str, object]]:
...
def process(data: np.ndarray | torch.Tensor) -> tuple[int, int]:
...For methods that return self (method chaining), use Self from typing:
from typing import Self
def roi_circle(self, radius: float | None = None) -> Self:
...
return selfType hints go on public API first. Internal methods (starting with _) don't strictly need them, but they're simple to add and can help with readability.
Now it's your turn. Contributing should be enjoyable. It will take dozens of PR iterations to get comfortable, just like learning to drive the microscope. Practice and feedback are the gifts that we have. One of the best ways to improve is to keep seeking feedback, iterate, and learn how to communicate effectively through PRs and code. If you're uncertain about how to write a PR, coding standards, or anything in this guide, feel free to reach out to @bobleesj. You can always make a pull request to your own fork (not upstream) and tag @bobleesj to review your code and communication style.



