This template is designed for climate and public health analysis at Harvard FASRC, with:
- R or Python for analysis code
- Quarto for reproducible notebooks/reports
- Spack for reproducible software environments on HPC
ProjectTemplate/
├── README.md
├── data/
│ ├── raw/ # immutable source data
│ ├── interim/ # temporary processing outputs
│ └── processed/ # analysis-ready datasets
├── sandbox/ # scratch work, one-off experiments, not production code
├── notebooks/ # .qmd analyses and reports
├── src/ # reusable analysis code (R or Python)
└── env/ # environment definitions (e.g., spack.yaml)
Use Spack to pin software versions and keep runs reproducible across cluster nodes.
Use curl to predict which packages you will need by pinging the Posit Package Manager Repository:
curl -X POST "https://packagemanager.posit.co/__api__/repos/4/sysreqs" \
-H "Content-Type: application/json" \
-d '{
"requirements":["targets","readxl","dplyr","fusen","quarto"],
"r_version":"4.5.1",
"bioc_version":"3.21"
}'# Example workflow (adjust to your local FASRC setup)
spack env create project-env ./env/spack.yaml
spack env activate project-env
spack installpython -m venv .venv
source .venv/bin/activate
pip install -U pip
pip install -r requirements.txtinstall.packages(c("tidyverse", "arrow", "data.table", "sf", "targets"))
# Optional for project-local package management:
# install.packages("renv")
# renv::init()Keep computational narratives in .qmd files under notebooks/.
quarto render notebooks/As analyses stabilize, move reusable code from notebooks/scripts into a package.
- Put reusable functions/classes in
src/<package_name>/ - Add
pyproject.tomland package metadata early - Keep notebooks thin: call package functions instead of embedding logic
- Add tests for package functions and run them in CI/HPC jobs
- Organize reusable functions in
R/ - Use
DESCRIPTION+NAMESPACEand document withroxygen2 - Use
usethis/devtoolsto scaffold and maintain package structure - Keep analyses as consumers of package functions, not as the source of truth
sandbox/: scratch space for exploratory code and temporary files. Treat as disposable.data/: canonical project data location. Keep raw data read-only; write derived data tointerim/orprocessed/.
- Pin environment dependencies (Spack + language-specific lockfiles).
- Avoid editing raw input files.
- Promote shared logic into versioned package code.
- Render Quarto reports from scripted, repeatable pipelines.