Project Template: Climate + Public Health Data Science (FASRC HPC)

This template is designed for climate and public health analysis at Harvard FASRC, with:

R or Python for analysis code
Quarto for reproducible notebooks/reports
Spack for reproducible software environments on HPC

Suggested project layout

ProjectTemplate/
├── README.md
├── data/
│   ├── raw/          # immutable source data
│   ├── interim/      # temporary processing outputs
│   └── processed/    # analysis-ready datasets
├── sandbox/          # scratch work, one-off experiments, not production code
├── notebooks/        # .qmd analyses and reports
├── src/              # reusable analysis code (R or Python)
└── env/              # environment definitions (e.g., spack.yaml)

Environment setup (Spack)

Use Spack to pin software versions and keep runs reproducible across cluster nodes.

Use curl to predict which packages you will need by pinging the Posit Package Manager Repository:

curl -X POST "https://packagemanager.posit.co/__api__/repos/4/sysreqs" \
  -H "Content-Type: application/json" \
  -d '{
    "requirements":["targets","readxl","dplyr","fusen","quarto"],
    "r_version":"4.5.1",
    "bioc_version":"3.21"
  }'

# Example workflow (adjust to your local FASRC setup)
spack env create project-env ./env/spack.yaml
spack env activate project-env
spack install

Installing analysis packages

Python

python -m venv .venv
source .venv/bin/activate
pip install -U pip
pip install -r requirements.txt

R

install.packages(c("tidyverse", "arrow", "data.table", "sf", "targets"))
# Optional for project-local package management:
# install.packages("renv")
# renv::init()

Quarto usage

Keep computational narratives in .qmd files under notebooks/.

quarto render notebooks/

Best practices: move analysis into a package

As analyses stabilize, move reusable code from notebooks/scripts into a package.

Python package path

Put reusable functions/classes in src/<package_name>/
Add pyproject.toml and package metadata early
Keep notebooks thin: call package functions instead of embedding logic
Add tests for package functions and run them in CI/HPC jobs

R package path

Organize reusable functions in R/
Use DESCRIPTION + NAMESPACE and document with roxygen2
Use usethis/devtools to scaffold and maintain package structure
Keep analyses as consumers of package functions, not as the source of truth

Directory conventions

sandbox/: scratch space for exploratory code and temporary files. Treat as disposable.
data/: canonical project data location. Keep raw data read-only; write derived data to interim/ or processed/.

Reproducibility notes

Pin environment dependencies (Spack + language-specific lockfiles).
Avoid editing raw input files.
Promote shared logic into versioned package code.
Render Quarto reports from scripted, repeatable pipelines.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project Template: Climate + Public Health Data Science (FASRC HPC)

Suggested project layout

Environment setup (Spack)

Installing analysis packages

Python

R

Quarto usage

Best practices: move analysis into a package

Python package path

R package path

Directory conventions

Reproducibility notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.vscode		.vscode
data		data
env		env
notebooks		notebooks
sandbox		sandbox
slurm		slurm
src		src
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

Project Template: Climate + Public Health Data Science (FASRC HPC)

Suggested project layout

Environment setup (Spack)

Installing analysis packages

Python

R

Quarto usage

Best practices: move analysis into a package

Python package path

R package path

Directory conventions

Reproducibility notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages