Contributing to open source projects on GitHub

Welcome! This guide teaches you how to contribute to open source using a fork-based workflow. We use quantem as an example, but these patterns work for any project. The goal: you'll never lose work, your collaborators will always know what you're doing, and your code history will tell a coherent story. Contributing to open source is one of the most rewarding ways to grow as a scientist and developer. Writing code to a high standard can feel difficult and overwhelming at first, but it is a great investment in your career. The Git and GitHub skills you build here will serve you for a lifetime.

If you have any questions or find instructions unclear, reach out to @bobleesj.

Quick start

Case	When to use	PR target
1	Initial/private algorithm; colleague wants to test and contribute	`<username>/quantem:<branch>`
2	Large features, multi-person collaboration	`electronmicroscopy/quantem:<branch>`
3	Typos, small bugs, documentation	`electronmicroscopy/quantem:dev`

How do I set up my computer? (5-10 minutes)

One-time computer setup

Install Git: https://git-scm.com/
Windows user? Install Git Bash: https://gitforwindows.org/
Set up SSH for GitHub so you don't have to enter your password every time (see Appendix C).
Install GitHub CLI: https://cli.github.com/

Fork setup

Go to https://github.com/electronmicroscopy/quantem and click Fork.

Clone your fork:

git clone https://github.com/<your-username>/quantem.git
cd quantem

Add the upstream remote:

git remote add upstream https://github.com/electronmicroscopy/quantem.git

Verify remotes with git remote -v. You should see both origin (your fork) and upstream (the org repo).

Development environment setup

For installing quantem in development mode with uv, setting up pre-commit hooks, and managing dependencies, see the quantem CONTRIBUTORS.md.

Making your first contribution

There are three common cases when contributing to quantem:

Case 1: Prototyping on your fork - initial/private algorithm development; colleague helps test
Case 2: Major feature development - larger features requiring collaboration on upstream
Case 3: Quick fixes - typos, small bugs, small features, documentation updates

Case 1: Prototyping on your fork

Assume Will is prototyping a new alignment algorithm on his personal fork. The code isn't ready for the main repository yet, but Will asks Colin for help testing and improving it.

Here's an overview. Follow the steps below first, then use this diagram as a reference:

Will starts the prototype

Will checks out his local dev branch and pulls the latest from upstream (after cloning, the local dev branch has the same commit history as the dev branch on his fork):
```
git checkout dev
git pull upstream dev
```
Will creates a local branch off of the latest commits from dev:
```
git checkout -b align
```
Will makes changes, commits, and uploads the local branch to his fork:
```
git add <files>
git commit -m "Add initial alignment algorithm"
git push -u origin align
```
Note: The -u flag sets upstream tracking. You only need it the first time you push a new branch. After that, git push is sufficient.

Will continues iterating on wwmills/quantem:align. Now Colin wants to contribute and test Will's code. How does Colin contribute?

Colin joins to help

Colin adds Will's fork URL so he can fetch Will's latest commits:
```
git remote add will https://github.com/wwmills/quantem.git
```
Colin fetches Will's commits and checks out Will's branch:
```
git fetch will
git checkout will/align
```
Colin creates a local branch off of Will's align branch:
```
git checkout -b align-subpixel
```

Colin makes changes and commits:

git add <files>
git commit -m "Add subpixel alignment support"

Colin uploads his local align-subpixel branch to Colin's fork (https://github.com/cophus/quantem):
```
git push -u origin align-subpixel
```
Colin visits https://github.com/wwmills/quantem and clicks the green Compare & pull request button to create a PR from cophus/quantem:align-subpixel to wwmills/quantem:align. Colin follows the guidelines in Making the pull request review process effective.

What happens next?

Will reviews and merges Colin's PR into wwmills/quantem:align.
When the feature is ready, Will creates a PR from wwmills/quantem:align to electronmicroscopy/quantem:dev.
A maintainer reviews and merges it. Colin's commits are preserved in the contribution history.

Note: If the PR to quantem/dev would be too large (e.g., thousands of lines), consider using Case 2 instead. We don't want to overwhelm core reviewers with massive PRs. Case 2 creates a feature branch on upstream where multiple people can contribute iteratively with smaller, reviewable PRs before merging to quantem/dev.

Case 2: Major feature development

Here we use the example of Bob. Bob is building drift correction for quantem using PyTorch to speed up computation. This is an actual workflow Bob uses to collaborate with Will. The feature is too large for a single PR and requires multiple iterations before it's ready for quantem/dev. How does Bob collaborate with Will so they can build and contribute collectively?

Here's an overview. Follow the steps below first, then use this diagram as a reference:

Creating the branch on upstream (maintainer)

Go to https://github.com/electronmicroscopy/quantem.
Click the branch dropdown (shows quantem/dev), type the new branch name (e.g., drift-torch), and click Create branch: drift-torch from quantem/dev.

Contributing to the branch

When multiple people contribute to a branch on upstream, each person creates local branches and PRs iteratively.

Bob fetches the branch and pulls the latest:

git fetch upstream
git checkout drift-torch
git pull upstream drift-torch

Bob creates a branch named after the specific feature:
```
git checkout -b drift-torch-rigid
```

Bob makes changes and commits:

git add <files>
git commit -m "Add rigid transformation for drift correction"

Bob uploads his local drift-torch-rigid branch to his fork (origin):
```
git push -u origin drift-torch-rigid
```
Bob visits https://github.com/electronmicroscopy/quantem and clicks the green Compare & pull request button to create a PR from bobleesj/quantem:drift-torch-rigid to electronmicroscopy/quantem:drift-torch (not dev). Bob follows the guidelines in Making the pull request review process effective.

For Bob's next contribution, Bob does not branch off from drift-torch-rigid. Instead, Bob starts from the latest upstream/drift-torch branch. It is the source of truth and contains merged commits from all contributors:

Bob switches to the branch and pulls the latest:

git checkout drift-torch
git pull upstream drift-torch

Bob creates a new branch for the next feature:
```
git checkout -b drift-torch-affine
```

Bob makes changes, commits, and uploads his local drift-torch-affine branch to his fork:

git add <files>
git commit -m "Add affine transformation for drift correction"
git push -u origin drift-torch-affine

Bob visits https://github.com/electronmicroscopy/quantem and clicks the green Compare & pull request button to create a PR from bobleesj/quantem:drift-torch-affine to electronmicroscopy/quantem:drift-torch. Bob follows the guidelines in Making the pull request review process effective.

Another contributor joins (Will)

Will wants to contribute to the same branch. Will sees drift-torch, which contains the latest commits merged from drift-torch-rigid and drift-torch-affine. Will has experimental data and tests the code across multiple dimensions, debugging and fixing issues from previous commits:

Will fetches the branch and pulls the latest:

git fetch upstream
git checkout drift-torch
git pull upstream drift-torch

Will creates a branch for adding tests:
```
git checkout -b drift-torch-test
```

Will makes changes, commits, and uploads his local drift-torch-test branch to his fork:

git add <files>
git commit -m "Add unit tests for drift correction"
git push -u origin drift-torch-test

Will visits https://github.com/electronmicroscopy/quantem and clicks the green Compare & pull request button to create a PR from wwmills/quantem:drift-torch-test to electronmicroscopy/quantem:drift-torch. Will follows the guidelines in Making the pull request review process effective.

Will continues with drift-torch-validate, drift-torch-large-images, etc. Both Bob and Will can contribute simultaneously. The feature-based naming (-rigid, -test, -validate) keeps everyone's work organized and descriptive.

Tip: To avoid merge conflicts, communicate with your team before starting work. Ideally, two people should not edit the same file at the same time. A quick message ("I'm working on drift.py") helps prevent conflicts.

When the feature is complete and tested, either Bob or Will can create a PR from electronmicroscopy/quantem:drift-torch to electronmicroscopy/quantem:dev following the pull request guidelines. A maintainer reviews and merges it.

Case 3: Quick fixes (typos, small bugs, small features)

You spotted a typo in the README. The fix is small enough that it can go directly to quantem/dev as a single PR. How do you contribute it?

Here's an overview. Follow the steps below first, then use this diagram as a reference:

Switch to your local dev branch (it already exists after cloning) and pull the latest commits from upstream/dev:
```
git checkout dev
git pull upstream dev
```
Create a local branch called fix-readme-typo:
```
git checkout -b fix-readme-typo
```
Edit README.md using your favorite IDE.

Stage and commit the changes:

git add README.md
git commit -m "Fix typo in README installation section"

Upload your local fix-readme-typo branch to your fork (origin):
```
git push -u origin fix-readme-typo
```
Verify the branch exists on GitHub by visiting https://github.com/<your-username>/quantem/tree/fix-readme-typo.
Go to https://github.com/electronmicroscopy/quantem and click the green Compare & pull request button to create a PR from <your-username>/quantem:fix-readme-typo to electronmicroscopy/quantem:dev. Follow the guidelines in Making the pull request review process effective.

GitHub issues and pull requests

Why do we write GitHub issues?

GitHub issues are where we discuss what to build, report bugs, and converge on design decisions before writing code. Not everyone can join the bi-weekly dev meeting, but everyone can participate in a GitHub issue. It's a democratized platform where anyone can propose ideas, ask questions, and collaborate regardless of time zone or schedule. Every issue has a permanent URL that anyone can reference months or years later to understand why a feature was designed a certain way, who contributed to the discussion, and what alternatives were considered.

We encourage you to open issues freely. You don't need to have a solution to start a conversation. Before implementing a new feature, open an issue first. This gives the team a chance to discuss the problem, explore potential solutions, and align on an approach. It also engages potential users early. See Issue #149 for an example where the team discussed the 5D-STEM dataset design before implementation.

Tag people with @username to send direct notifications.

An issue can be divided into two sections:

### Problem

[Describe the problem or feature request]

### Proposed solutions

[Describe possible approaches]

If you're reporting a bug, you don't have to propose a solution. Reporting the problem is valuable on its own. Attach screenshots, error messages, and metadata (Python version, OS, package versions) as needed to help with debugging. If you're proposing a new feature, put effort into the proposed solutions section. Include screenshots, API design, and use cases so that potential reviewers can read it and brainstorm with you.

Examples:

Issue #136 - bug report (Python 3.13 type alias compatibility)
Issue #138 - feature request (quantem.__version__ support)
Issue #149 - feature design discussion (pre-implementation alignment)
Issue #105 - architecture discussion (whether to add Widget module)

Making the pull request review process effective

When a PR addresses an issue, use Closes #<issue-number> in the PR body. Once merged, the issue will automatically close (see PR #151).

Start as a Draft PR while work is in progress.
Write a short, descriptive title (see How do I write great pull request and issue titles?).
In the body, showcase the problem we are solving. Attach screenshots, plots, and design visuals. Our reviewers are colleagues here to see how we use Python to solve a scientific problem.

Every PR has a public URL. The more accessible we make it through visuals and clear writing, the more people can give us feedback without running anything. More reviewers means more input, more potential users, and more impact of our code. I have found a PR can be effective with the following yet minimal framework
```
### What problem does this PR address?

Closes #<issue-number>

[Describe the problem this PR solves. Focus on inputs/outputs.
Attach screenshots, plots, and before/after comparisons.]

[Show the function signatures, class interfaces, and how a scientist
would use this in a notebook. The API is what a scientist actually
types. Get this right first.]

### What should the reviewer(s) do?

[Explain how to test, what to look for, or any dependencies on other PRs]
```

Note: Writing represents our internal state as the author. The goal is to externalize our reasoning so the team can make decisions collectively. We write as little as required, but as much as needed. A PR is self-serving documentation for developers and debugging along with commits. Reviewing takes time, and writing effectively respects the reviewer's time so we can prioritize advancing science.

Before tagging a reviewer, go to Files changed and review our own code. We'll catch mistakes and save everyone time. When ready, tag the reviewer and say "Ready for review."
After receiving feedback, optionally turn the reviewer's comments into a checklist and check off items. This respects the reviewer's input, acknowledges their feedback, and serves as a to-do list for everyone. If the list is too long or beyond scope, turn it into a GitHub issue for later tracking. After making changes, tag the reviewer again with "Ready for review." See PR #146 for an example where George and Colin provided design feedback, and this comment for an example checklist.
Leave inline comments on your own PR to guide the reviewer. Don't make the reviewer guess why you made a decision. Add comments on your own diff pointing out non-obvious choices, trade-offs, or areas where you want specific feedback. This saves a round trip and shows you've thought it through.

For reviewers: We invite you to review others' work. It's one of the fastest ways to learn the codebase. Reviewing maintains our standards and ensures that code is understood by more than one person. When only the author understands the code, that's a weak link. A thoughtful review reduces those weak links, catches bugs before they reach a scientist's notebook, and helps the author grow. Respect the author's time by being specific, and respect your own time by not re-reviewing things pre-commit already handles.

Understand the nature of the problem and how it benefits scientists and the community first. Not the design, not the code, not the technology. If not, the author should open an issue and discuss what problem the PR solves before adding more commits.
Check out the branch locally and run the tests.
Focus on correctness, API design, test coverage, and docstrings. Don't nitpick formatting if pre-commit handles it.
Be specific. "This will break if scan_shape has an odd dimension" is useful. "Needs work" is not.
Approve when it's good enough, not perfect. If there are remaining concerns, discuss whether another iteration is needed or encourage the author to create an issue to address potential bugs and improvements down the road.
If you request changes, say what "done" means to you as a reviewer.

Examples:

PR #146 - visual/UI focused (widget, screenshots)
PR #151 - data structure/API design focused

How do I resolve disagreements between reviewers?

We have a common goal of advancing science. When reviewers disagree, you may adopt the following steps:

Discuss the code, not the coder. "This approach has O(n^2) complexity" is fine. "You always write slow code" is not.
Try to resolve it between yourselves first. Use inline comments on the PR. If needed, follow up with a private message or a quick call.
If stuck, bring in a third person. A fresh perspective breaks deadlocks.
If still unresolved, present both approaches at the bi-weekly dev meeting. The higher-impact idea wins, not the louder voice or seniority.
Once decided, commit fully. No passive resistance. Code is always evolving. The decision may not be the most optimal one right now, but it can change. If you have a strong reason to revisit it later, bring it up again.

The best way to avoid disagreements at the PR level is to not have them in the first place. This is why we write issues and communicate our design and problems clearly before writing code. A PR should be an implementation of a plan the team already agrees on.

How do I write great pull request and issue titles?

Why do titles matter? Our goal is to advance science, not spend time debugging. Every hour someone spends tracing a regression through vague commit messages is an hour not spent on research. PR titles end up in git log, release notes, and blame annotations. When someone runs git log --oneline a year from now to find where a bug was introduced, "Infrastructure changes" is a dead end. "Fix hot-pixel filter zeroing valid data on Arina datasets" points them straight to the answer. The time you invest writing a clear title once saves the entire team time forever.

This is part of a broader philosophy: once your code works in your fork and you're past the prototype stage, it's often worth spending time upfront on clear communication and good infrastructure. Prototyping is fast and messy by design. But when code moves to upstream, it becomes shared property. Clear titles, descriptions, and commit messages let the group make decisions collectively and trace problems back to their origin.

A PR title is the first thing a reviewer reads and the last thing a debugger searches.

Format: Start with a verb. Cover what the change is and answer "so what?" (why it matters).

# Bad - what file, no context
Updated vector.py
Infrastructure changes
Bug fix

# Good - says what, but not why it matters
Add cell-level indexing to Vector
Fix hot-pixel filter
Remove deprecated parameter

# Great - what changed and so what
Fix hot-pixel filter zeroing valid data on Arina SNSF datasets
Rename rotation_angle to rotation_angle_deg for explicit degree input in direct ptycho

Issue titles follow the same pattern. Every title must cover two things: (1) what's happening (bug) or what needs to be done (feature) and (2) so what?

# Good
Problem with dataset
Question about API

# Great
Dataset4dstem.fourier_resample crashes on odd scan dimensions, blocking 3D reconstruction
Add quantem.__version__ so users can report their version in bug reports

How do I check out someone's pull request?

GitHub CLI (gh) provides shortcuts for common tasks. See GitHub CLI installation if not already installed.

List open pull requests:

gh pr list

Check out a specific PR by number:

gh pr checkout 146

This creates a local branch with the PR's changes so you can test or review the code.

You may also use gh issue list and gh issue view <number> to view issues from the command line. For more commands, see the GitHub CLI documentation.

Guidelines

We encourage everyone to contribute early and often. Everyone on this team is balancing research, coursework, and life. The time someone spends reviewing our code, debugging our error message, or deciphering our commit history is time they could be more spent on science. Guidelines are here to respect our collective time:

We focus on the code, not the person. We keep feedback constructive and stay neutral in PRs and issues. We provide feedback to help each other improve.

We stay accountable and responsive. A reviewer waiting on your reply can't move forward. If you need more time, say so. "I'll address this by Thursday" is better than silence. Fast iteration keeps everyone unblocked.

We maintain balanced code quality. We follow NumPy docstring conventions and PEP 8 standards, and provide example notebooks and tests where necessary.

We ship, but not broken code. Prototype freely in your fork or a branch on upstream. Unoptimized code is fine. Broken code stays in the fork until it's fixed.

We align before we build. Before taking on a large PR, bring it up at the bi-weekly quantem meeting or open a GitHub issue. Ten minutes of discussion can save weeks of wasted work on something the team doesn't need or would design differently.

Common mistakes to avoid

Don't commit directly to your local dev branch. Those commits won't match upstream/dev, and you'll need to reset. Instead, always branch: git pull upstream dev then git checkout -b <branch-name>.

Don't stage everything blindly. Use git add <specific-files> instead of git add . to avoid accidentally committing .env files, API keys, or large datasets. Before submitting a PR, check Files changed to confirm only intended files are included.

Don't force push to shared branches. In Case 2, multiple people contribute to upstream/drift-torch. Force pushing can erase a teammate's work. Use PRs to merge changes instead.

Don't let upstream feature branches live too long. The longer a branch diverges from dev, the more merge conflicts accumulate and the harder the final merge becomes. If possible, aim to merge feature branches into dev within a few weeks, not months.

Don't name branches too broadly. imaging is an entire module, not a feature. Name branches after the specific feature: imaging-cellview, imaging-roi-export, etc. Use clear, consistent naming as shown in Case 2 (drift-torch-rigid, drift-torch-affine, drift-torch-test). Don't start a large PR without alignment. See "Align before building" in Guidelines.

Don't mix unrelated changes in one PR. One PR, one purpose: a feature, a refactor, or a bug fix. When unrelated changes get bundled together, reviewers spend more time untangling what changed than evaluating whether it's correct.

Don't leave review comments unanswered. Respond to each comment saying whether it's been addressed or is out of scope. Unanswered comments leave reviewers guessing and slow down the next round.

Don't over-engineer tests. Every test a human has to review, maintain, and debug costs real time. More lines of code is not better. Respect human time above anything else. Write tests that cover how scientists actually use the code, not every possible edge case. See D2. How do I write effective tests? for details.

Don't skip the Examples section in docstrings. Most public functions should have at least one usage example. See D1. How do I write effective docstrings? for details.

Don't write cryptic error messages. Error messages should guide the user on what to do next without digging into the entire codebase. See D3. How do I write great error messages? for details.

Don't create your own coordinate system. Use the (row, col) convention in quantem. See D4. How do I represent coordinates in NumPy, Matplotlib, and quantem? for details.

Don't use legacy type hints. Optional, List, Dict, Tuple, Union, and Any from typing are no longer needed in Python 3.11+. See D5. How do I use type hints? for details.

Don't create coding conventions. This includes comments, line spacing, and formatting. We follow NumPy docstring conventions and PEP 8 standards.

Coding standards

Prototyping is fast and messy by design, and you may even submit scientific code and publish a paper with it. But we make it open source to increase impact across the community. We encourage you to experiment freely in your fork. But the moment code hits upstream, it becomes shared responsibility. Everyone's time is extremely valuable. Putting the effort into clear, well-structured code respects every scientist who will read, review, and build on it. The code represents our standards, and we want those standards to help scientists. See Appendix D: Coding standards for the details.

Troubleshooting

I accidentally committed to dev instead of a new branch

If you haven't pushed yet:

git reset --soft HEAD~1

This undoes the commit but keeps your changes staged. Then create your new branch and recommit:

git checkout -b my-feature
git commit -m "Your message"

If you already pushed to your fork's dev, you'll need to reset it:

git checkout dev
git reset --hard upstream/dev
git push origin dev --force

Then create your new branch and recommit your changes.

I have merge conflicts

Merge conflicts happen when two people edit the same lines. Git marks conflicts like this:

<<<<<<< HEAD
your changes
=======
their changes
>>>>>>> branch-name

To resolve:

Open the conflicted file and look for <<<<<<< markers.
Decide which version to keep (or combine both).
Remove the conflict markers (<<<<<<<, =======, >>>>>>>).

Stage and commit:

git add <file>
git commit -m "Resolve merge conflict in <file>"

Prevention tip: Communicate with your team. If two people need to edit the same file, coordinate timing or split the work into different sections.

I made changes but Git says "nothing to commit"

Common causes:

You're in the wrong directory. Run pwd and make sure you're inside the repository.
Changes aren't saved. Save your files in your editor.
Files are gitignored. Check .gitignore to see if your file pattern is excluded.
You already committed. Run git log to see recent commits.

I want to undo my last commit

If you haven't pushed yet:

git reset HEAD~1 --soft    # Keeps changes staged
git reset HEAD~1 --mixed   # Keeps changes unstaged (default)
git reset HEAD~1 --hard    # Discards changes entirely (careful!)

If you already pushed, it's safer to make a new commit that reverses the changes:

git revert HEAD            # Creates a new commit undoing the last one
git push origin <branch>

Git says "Your branch is behind" when I try to push

The upstream branch has new commits. Pull the latest first:

git pull upstream <branch>

If there are conflicts, resolve them (see merge conflicts above), then push.

Appendix A: Glossary

What is the difference between remote and local?

Remote is a repository hosted on GitHub (under your account or an organization). Local is the copy on your computer.

Example: https://github.com/electronmicroscopy/quantem is remote. The folder on your computer after git clone is local.

What is the difference between fork and clone?

Fork creates a personal copy on GitHub under your account. Clone downloads a repository to your local machine.

Example: Clicking "Fork" on GitHub creates https://github.com/bobleesj/quantem. Running git clone downloads it to your computer.

What is the difference between origin and upstream?

Origin points to your fork. Upstream points to the original repository.

Example: origin = https://github.com/bobleesj/quantem. upstream = https://github.com/electronmicroscopy/quantem.

What is the difference between branch and fork?

A branch is a parallel version of code within the same repository. A fork is a copy of the entire repository under a different account.

Example: quantem/dev and quantem/drift are branches. electronmicroscopy/quantem and bobleesj/quantem are forks.

What is the difference between pull and fetch?

Fetch downloads changes from a remote but doesn't apply them. Pull fetches and merges the changes into your current branch.

Example: git fetch upstream downloads updates. git pull upstream dev downloads and merges into your local dev.

What is the difference between git add and git commit?

git add stages changes (prepares them to be committed). git commit saves the staged changes to your local repository.

Example: git add file.py stages the file. git commit -m "Add feature" saves it with a message.

What is the difference between git pull and git push?

git pull downloads changes from a remote and merges them into your local branch. git push uploads your local commits to a remote.

Example: git pull upstream dev gets updates from upstream. git push origin fix-typo sends your commits to your fork.

What is a PR (pull request)?

A request to merge your changes from one branch into another.

Example: PR from bobleesj/quantem:fix-typo to electronmicroscopy/quantem:dev.

What is detached HEAD state?

When you check out a remote-tracking branch directly (like will/align), Git puts you in "detached HEAD" state. This means you're not on a local branch. You're viewing a snapshot of the remote.

Example from Case 1: When Colin runs git checkout will/align, he's in detached HEAD state. Any commits made here won't belong to a branch and could be lost. That's why Colin creates a local branch with git checkout -b align-subpixel. This saves his work to a proper branch.

How do I temporarily save uncommitted changes (git stash)?

When you have uncommitted changes and need to switch branches, Git will block you to prevent losing your work. Use git stash to temporarily save your changes, switch branches, and then restore them later with git stash pop.

git stash           # Save uncommitted changes
git checkout dev    # Switch to another branch
# ... do other work ...
git checkout my-branch
git stash pop       # Restore your saved changes

What is git push --force?

A normal git push fails if the remote branch has commits your local branch doesn't have. Git protects you from accidentally overwriting work. git push --force tells Git: "I know the histories don't match. Replace the remote with my local version anyway."

Force push is safe for your own branches on your fork (you're only affecting yourself), but dangerous for shared branches on upstream (you could delete teammates' commits). Use PRs instead of force pushing to shared branches.

Appendix B: Acronyms

PR - Pull Request
SSH - Secure Shell
CLI - Command Line Interface (e.g., Terminal, Warp, Git Bash)
IDE - Integrated Development Environment (e.g., VS Code, PyCharm)
API - Application Programming Interface
UI - User Interface
UX - User Experience
GUI - Graphical User Interface
CI/CD - Continuous Integration / Continuous Deployment (automated testing and deployment in the cloud, e.g., GitHub Actions)

Appendix C: SSH for GitHub

SSH allows you to push and pull without entering your password every time.

In your terminal, run the following commands to generate a new SSH key pair. Replace <email@example.com> with your email address.

mkdir -p ~/.ssh
cd ~/.ssh
ssh-keygen -o -t rsa -C "<email@example.com>"
cat id_rsa.pub

Visit https://github.com/settings/keys.
Click New SSH key.
Set the Title as <your-computer-name>-key.
Under Key, copy and paste the content of the id_rsa.pub file. It should start with ssh-rsa and end with your email address.
Click Add SSH key.
Done!

Appendix D: Coding standards

D1. How do I write effective docstrings?

Why write docstrings? They appear in three places: (1) VS Code and PyCharm show them when you hover over a function, so scientists get instant help without leaving their editor, (2) Sphinx pulls them into the official documentation automatically (see Show2D API docs for a live example), and (3) shift-tab on Jupyter notebook. One docstring, three audiences, zero extra work.

We use NumPy-style docstrings. A docstring should answer two questions, in this order: (1) Why does this code exist? Every function solves a problem. State the problem first. (2) How does it work and why is it designed this way? Explain the approach and key design choices so contributors understand the reasoning, not just the interface. The reader is a scientist, not a code reviewer.

# Wrong - describes implementation
def roi_annular(self, inner_radius=None, outer_radius=None):
    """Create a boolean mask by computing pixel distances from the
    center using torch.cdist, then applying inner/outer thresholds."""

# Right - states the problem, then explains the design
def roi_annular(self, inner_radius=None, outer_radius=None):
    """Set ROI mode to annular for ADF/HAADF imaging.

    The annular ROI integrates over a donut-shaped region in the
    diffraction pattern. Use small inner radii for ADF, larger
    inner radii for HAADF. The virtual image updates immediately."""

When in doubt: "Would a scientist in the lab care about this sentence?" If no, it belongs in a code comment, not the docstring.

Docstrings support mathematical notation via reStructuredText. Use :math:`E = mc^2` for inline math and .. math:: blocks for display equations. This is especially useful for functions that implement known formulas, so the reader can verify the implementation against the literature.

def relativistic_wavelength(voltage_kv: float) -> float:
    """Compute the relativistic de Broglie wavelength of an electron

    At typical TEM accelerating voltages (80-300 kV), relativistic
    effects shorten the wavelength by several percent. This function
    uses the relativistic form:

    .. math::

        \lambda = \frac{h}{\sqrt{2 m_0 e V \left(1 + \frac{eV}{2 m_0 c^2}\right)}}

    Parameters
    ----------
    voltage_kv : float
        Accelerating voltage in kilovolts (e.g., 200 for a 200 kV TEM).

    Returns
    -------
    float
        Wavelength in angstroms.

    Examples
    --------
    >>> relativistic_wavelength(200)
    0.02508
    """

Structure

Short one-line summary (imperative mood, no period)

Longer description if needed. State the problem this
function solves, then explain the design. Supports
math notation: :math:`k = 2\pi / \lambda` for inline.

Parameters
----------
<name> : <type>
    What it represents and what units it uses.
<name> : <type>, optional
    What it controls. Default is <value>.

Returns
-------
<type>
    What the caller gets back.

Examples
--------
>>> <most common usage>

Complete example (non-widget function from quantem.core):

def fourier_resample(
    self,
    new_shape: tuple[int, int],
    method: str = "lanczos",
) -> Self:
    """Rescale scan dimensions without interpolation artifacts in real space

    Resampling in Fourier space avoids the ringing and aliasing that
    real-space interpolation introduces at sharp edges. This is the
    preferred way to change scan resolution after acquisition.

    Parameters
    ----------
    new_shape : tuple[int, int]
        Target (rows, cols) for the scan dimensions.
    method : str, optional
        Interpolation kernel. Default is "lanczos".

    Returns
    -------
    Self
        A new Dataset with the resampled scan dimensions.

    Examples
    --------
    >>> ds = Dataset4dstem("scan.h5")
    >>> ds_resampled = ds.fourier_resample((128, 128))
    >>> ds_resampled.scan_shape
    (128, 128)
    """

Key rules: name : type with spaces around colon. optional for parameters with defaults. >>> prompt in Examples so Sphinx renders them correctly. 2-3 examples, most common use case first.

Common docstring mistakes

Don't skip Examples. Even a one-liner is better than nothing.
Don't document _private methods. They don't appear in docs.
Don't repeat the type hint. radius : float → "Radius of the circle in pixels", not "A float specifying the radius..."
Don't describe implementation. "Compute the virtual image for the current ROI", not "sum pixels in the ROI mask using torch.sum."

D2. How do I write effective tests?

Why write tests? Tests mimic how a scientist actually uses the code. Every test is a real scenario: construct the object, call the method, check the result. This does two things: (1) it catches regressions before they reach a scientist's notebook, and (2) it shows other contributors how the code is meant to be used. Docstrings explain why a function exists, tests show how it runs. GitHub Actions runs pytest on every PR, so a failing test blocks the merge before it can affect anyone else.

What to test

Test the way a scientist actually uses the code. If you wrote fourier_resample, the most important tests are: resample to a smaller shape, resample to a larger shape, check that the output type is correct. That's how scientists will call it. You don't need to immediately test every edge case, extreme array size, or input type combination. Those tests can come later if a real bug surfaces.

Don't over-engineer tests. Over-tested code is brittle - every refactor breaks dozens of tests that were testing implementation details nobody cares about. Write the 5 tests that cover 95% of real usage, not the 50 tests that make the code impossible to change. Use your judgment on what matters.

Examples

The goal is simple: set up the input, call the function, check the expected output.

def test_to_numpy_from_torch():
    tensor = torch.tensor([1.0, 2.0, 3.0])
    result = to_numpy(tensor)
    assert isinstance(result, np.ndarray)
    assert np.allclose(result, [1.0, 2.0, 3.0])


def test_fourier_resample_preserves_scan_shape():
    ds = Dataset4dstem("scan.h5")
    ds_resampled = ds.fourier_resample((64, 64))
    assert ds_resampled.scan_shape == (64, 64)

D3. How do I write great error messages?

Why do error messages matter? Scientists working in a Jupyter notebook don't want to stop, open the API documentation, and search for what went wrong. They want to look at the traceback, understand what happened, and fix it on the spot. A great error message is a guide: it tells the user exactly what they did wrong, shows them the actual value that caused the problem, and tells them how to correct it. The goal is that the scientist never has to leave their terminal or notebook to resolve the issue.

A scientist hits an error at 11pm before a deadline. If the message says "Invalid scan shape", they're stuck. If it says "You passed a 2D array with shape (256, 256), but Dataset4dstem expects a 4D array. Pass scan_shape=(rows, cols) to reshape it", they fix it in 30 seconds and move on. Good error messages turn a support request into a self-service fix.

A great error message has two parts: (1) what the user did wrong and (2) a potential next step that fixes it. Include the actual value that caused the error so the user doesn't have to guess.

# Wrong - tells the user nothing
raise ValueError("Invalid scan shape")

# Wrong - says what's wrong but not how to fix it
raise ValueError(f"Got shape {data.shape}, expected 4D array")

# Right - talks directly to the user and tells them what to do
raise ValueError(
    f"You passed a {data.ndim}D array with shape {data.shape}, "
    f"but Dataset4dstem expects a 4D array "
    f"(scan_rows, scan_cols, det_rows, det_cols). "
    f"If your data is a flattened scan, pass scan_shape=(rows, cols) "
    f"to Dataset4dstem() to reshape it."
)

# Right - shows what they entered and lists valid options
raise ValueError(
    f"You entered method='{method}', which is not supported. "
    f"Choose from: {', '.join(sorted(VALID_METHODS))}."
)

D4. How do I represent coordinates in NumPy, Matplotlib, and quantem?

Why (row, col)? Our coordinate convention is grounded in the physical geometry of the microscope. In scanning electron microscopy, the beam scans left-to-right (fast scan direction, columns) and top-to-bottom (slow scan direction, rows). This is exactly how NumPy lays out a 2D array: array[row, col], where row is the slow axis and col is the fast axis. Our API uses (row, col) because it matches both the physics of the scan and the indexing you already use in NumPy. If we used (x, y) instead, you'd have to mentally swap axes every time you go between quantem and your arrays. That swap is exactly the kind of silent bug that produces wrong results without raising an error.

All user-facing coordinates use (row, col). The first value is always the row (slow scan, top-to-bottom), the second is the column (fast scan, left-to-right).

This applies everywhere:

Function/method parameters: center=(row, col), not center=(x, y).
Return values and dicts: {"row": r, "col": c}, not {"x": x, "y": y}.
Display and print output: coordinates shown as (row, col) in readouts, summary(), and error messages.
Variable names: pos_row/pos_col, roi_row/roi_col, not pos_x/pos_y.

Internal drawing code (canvas pixel positions, matplotlib plt.scatter(x, y), DOM events) can use x/y since those are screen coordinates, not image coordinates. But anything exposed to the user must use (row, col).

A common bug: you plot a coordinate with plt.scatter(row, col) and the point appears in the wrong place. Matplotlib expects (x, y), which is (col, row) in our convention. The table below shows the mapping:

System	First axis	Second axis	Notes
Our API	`row` (vertical)	`col` (horizontal)	User-facing, everywhere
NumPy	`array[row, col]`		Same: `row` = axis 0, `col` = axis 1
Matplotlib	`y` (vertical)	`x` (horizontal)	`plt.scatter(col, row)`, axes are swapped

D5. How do I use type hints?

Why type hints? Type hints let you understand the shape and structure of what a function expects and returns without reading the body. They catch bugs before runtime. When you type dataset.fourier_resample( in VS Code, type hints tell the editor what arguments are expected, so it can autocomplete and flag mistakes immediately.

They also keep variable names clean. Without type hints, you end up encoding the type into the name: stem_dataset_np_array, scan_shape_tuple. With type hints, you can just write data or scan_shape because hovering in VS Code already shows you it's an np.ndarray or tuple[int, int]. The type information lives in the signature, not cluttering the name.

We target Python 3.11+. Use built-in generics and X | Y union syntax. Do not import Optional, List, Dict, Tuple, Union, or Any from typing. These are unnecessary since Python 3.10+.

# Wrong - legacy typing imports
from typing import Optional, List, Dict, Tuple, Union, Any

def load(path: Optional[str] = None) -> List[Dict[str, Any]]:
    ...

def process(data: Union[np.ndarray, torch.Tensor]) -> Tuple[int, int]:
    ...

# Right - built-in syntax (Python 3.11+)
def load(path: str | None = None) -> list[dict[str, object]]:
    ...

def process(data: np.ndarray | torch.Tensor) -> tuple[int, int]:
    ...

For methods that return self (method chaining), use Self from typing:

from typing import Self

def roi_circle(self, radius: float | None = None) -> Self:
    ...
    return self

Type hints go on public API first. Internal methods (starting with _) don't strictly need them, but they're simple to add and can help with readability.

Now it's your turn. Contributing should be enjoyable. It will take dozens of PR iterations to get comfortable, just like learning to drive the microscope. Practice and feedback are the gifts that we have. One of the best ways to improve is to keep seeking feedback, iterate, and learn how to communicate effectively through PRs and code. If you're uncertain about how to write a PR, coding standards, or anything in this guide, feel free to reach out to @bobleesj. You can always make a pull request to your own fork (not upstream) and tag @bobleesj to review your code and communication style.

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
img		img
.gitignore		.gitignore
README.md		README.md
git_workflow_diagrams.py		git_workflow_diagrams.py

Folders and files

Latest commit

History

Repository files navigation

Contributing to open source projects on GitHub

Quick start

Table of contents

How do I set up my computer? (5-10 minutes)

One-time computer setup

Fork setup

Development environment setup

Making your first contribution

Case 1: Prototyping on your fork

Will starts the prototype

Colin joins to help

What happens next?

Case 2: Major feature development

Creating the branch on upstream (maintainer)

Contributing to the branch

Another contributor joins (Will)

Case 3: Quick fixes (typos, small bugs, small features)

GitHub issues and pull requests

Why do we write GitHub issues?

Making the pull request review process effective

How do I resolve disagreements between reviewers?

How do I write great pull request and issue titles?

How do I check out someone's pull request?

Guidelines

Common mistakes to avoid

Coding standards

Troubleshooting

Appendix A: Glossary

Appendix B: Acronyms

Appendix C: SSH for GitHub

Appendix D: Coding standards

D1. How do I write effective docstrings?

Structure

Common docstring mistakes

D2. How do I write effective tests?

What to test

Examples

D3. How do I write great error messages?

D4. How do I represent coordinates in NumPy, Matplotlib, and quantem?

D5. How do I use type hints?

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages