Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
102 changes: 102 additions & 0 deletions .github/workflows/docs.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,102 @@
name: CI Docs

on:
push:
branches:
- main
pull_request:

jobs:
build-docs:
name: "Build Docs"
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v6
with:
fetch-depth: 0 # Full history for accurate page timestamps

- uses: actions/setup-python@v6
with:
python-version: "3.12"

- name: Install package and dependencies
run: |
python -m pip install uv
uv sync
uv pip install great-docs

- name: Set up Quarto
uses: quarto-dev/quarto-actions/setup@v2

- name: Build docs
run: uv run great-docs build

- name: Save docs artifact
uses: actions/upload-artifact@v7
with:
name: docs-html
path: great-docs/_site

publish-docs:
name: "Publish Docs"
runs-on: ubuntu-latest
needs: "build-docs"
if: github.ref == 'refs/heads/main'
permissions:
pages: write
id-token: write
environment:
name: github-pages
url: ${{ steps.deployment.outputs.page_url }}
steps:
- uses: actions/download-artifact@v7
with:
name: docs-html
path: great-docs/_site

- name: Upload Pages artifact
uses: actions/upload-pages-artifact@v5
with:
path: great-docs/_site

- name: Deploy to GitHub Pages
id: deployment
uses: actions/deploy-pages@v5

preview-docs:
name: "Preview Docs"
runs-on: ubuntu-latest
needs: "build-docs"
if: github.event_name == 'pull_request'
permissions:
deployments: write
pull-requests: write
steps:
- uses: actions/download-artifact@v7
with:
name: docs-html
path: great-docs/_site

# Start deployment
- name: Configure pull release name
if: ${{ github.event_name == 'pull_request' }}
run: |
echo "RELEASE_NAME=pr-${{ github.event.number }}" >> $GITHUB_ENV

- name: Configure branch release name
if: ${{ github.event_name != 'pull_request' }}
run: |
# use branch name, but replace slashes. E.g. feat/a -> feat-a
echo "RELEASE_NAME=${GITHUB_REF_NAME//\//-}" >> $GITHUB_ENV

# Deploy
- name: Create Github Deployment
uses: bobheadxi/deployments@v1
id: deployment
if: ${{ !github.event.pull_request.head.repo.fork }}
with:
step: start
token: ${{ secrets.GITHUB_TOKEN }}
env: ${{ env.RELEASE_NAME }}
ref: ${{ github.head_ref }}
logs: "https://github.com/${{ github.repository }}/actions/runs/${{ github.run_id }}"
2 changes: 1 addition & 1 deletion .python-version
Original file line number Diff line number Diff line change
@@ -1 +1 @@
3.12
3.14
78 changes: 78 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
# Stagecoach: A CLI tool for Staging Analysis Data

Stage coach is a CLI tool for "staging" analysis
data to your project. It helps you organize and orchestrate input
data for your projects in a consistent, reproducible way,
making it easy to scaffold your data science project.
All of the configuration is done through a `stagecoach_manifest.yaml`
file, which you can customize to fit your needs.
Simply run `stagecoach hail` to request a manifest, edit it to source your,
data, and request the data to be staged with `stagecoach stage`. The tool
will run the necessary checks to ensure you have permission to access the
data. Additionally, `stagecoach` gently promotes best practices for data
management, such as symlinking source data, using R or Python projects,
using `git` for version control, `gitignore`-ing sensitive data,
and more! You can use the `stagecoach inspect` command to have it inspect
your manifest and report any issues before you stage your data.

The best way to get `stagecoach` is to install it using `uv`.
First, make sure you have [`uv` installed](https://docs.astral.sh/uv/getting-started/installation/):

```bash
curl -LsSf https://astral.sh/uv/install.sh | sh
```

Then, [create a project](https://docs.astral.sh/uv/concepts/projects/init/#creating-a-minimal-project) that `uv` will isolate for you with a
virtual environment:

```bash
uv init my-project
uv venv
```

Finally, you can install `stagecoach` to your virtual environment
with the following command:

```bash
# by using tool install, you get this tool available as a command line tool
uv tool install git+https://github.com/GoldenPlanetaryHealthLab/stagecoach.git@2-Stage-Data
```

## Usage

First, hail a `stagecoach`:

```bash
stagecoach hail
```

If you are NOT part of a Frontier workspace,
you must use the `--outpost` flag to mock a
Frontier workspace. This flag was designed with
collaborators in mind who might not be on FASRC but still want to use `stagecoach` to stage data for their projects.

```bash
stagecoach hail --outpost
```

Following this command, you will be prompted to answer a few questions
about your project and the data you want to stage.
This will help prefill a `stagecoach_manifest.yaml` file for you, which you
can then edit to customize your data staging.

Once you've filled in the manifest, use the `stagecoach inspect` command to check for any issues with your manifest:

```bash
stagecoach inspect
```

If there are no issues, you can proceed to stage your data with the `stagecoach stage` command:

```bash
stagecoach stage
```

This will run the necessary checks to ensure you have permission to access
the data, and then stage the data according to your manifest configuration.

Run `stagecoach [COMMAND] --help` to see more details about each command and its options.
2 changes: 2 additions & 0 deletions data/inputs/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
*
!.gitignore
2 changes: 2 additions & 0 deletions data/intermediates/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
*
!.gitignore
2 changes: 2 additions & 0 deletions data/outputs/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
*
!.gitignore
121 changes: 117 additions & 4 deletions notebooks/checks.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,18 @@ from enum import Enum
from pathlib import Path

class Severity(Enum):
"""
Severity levels for manifest validation results.

Attributes
----------
PASS : str
The check succeeded.
WARNING : str
The check surfaced a non-blocking issue.
ERROR : str
The check found a blocking issue.
"""
PASS = "pass"
WARNING = "warning"
ERROR = "error"
Expand All @@ -80,7 +92,18 @@ PRINCIPLES = {
@dataclass
class CheckResult:
"""
Result from a single Stagecoach manifest check.
Represent the outcome of a single manifest check.

Attributes
----------
name : str
Stable identifier for the check.
state : Severity
Severity assigned to the check result.
message : str
Human-readable explanation of the outcome.
principle : int | list[int]
Frontier principle number, or numbers, associated with the check.
"""
name: str
state: Severity
Expand Down Expand Up @@ -110,6 +133,19 @@ Now, let's define a few checks:

```{python}
def check_project_exists(directory: Path) -> CheckResult:
"""
Verify that the project directory exists.

Parameters
----------
directory : Path
Directory declared as the project working directory.

Returns
-------
CheckResult
Passes when ``directory`` exists and is a directory.
"""
if directory.exists() and directory.is_dir():
return CheckResult(
name="project_exists",
Expand All @@ -126,9 +162,17 @@ def check_project_exists(directory: Path) -> CheckResult:

def check_git_repo_exists(directory: Path) -> CheckResult:
"""
Check that the project directory is a git repository.
Verify that the project directory is under Git version control.

Parameters
----------
directory : Path
Project directory to inspect.

Pass/Fail, no exceptions.
Returns
-------
CheckResult
Passes when a ``.git`` directory is present.
"""
if (directory / ".git").exists():
return CheckResult(
Expand All @@ -146,6 +190,20 @@ def check_git_repo_exists(directory: Path) -> CheckResult:
)

def check_environment_exists(directory: Path) -> CheckResult:
"""
Look for a project environment specification or lockfile.

Parameters
----------
directory : Path
Project directory to inspect.

Returns
-------
CheckResult
Passes when at least one supported environment file exists and
warns otherwise.
"""
candidates = [
"rv.lock",
"renv.lock",
Expand Down Expand Up @@ -186,6 +244,20 @@ started writing code:
#| sorting-hat: keep

def check_code_exists(directory: Path) -> CheckResult:
"""
Check whether the project contains analysis or source code files.

Parameters
----------
directory : Path
Project directory to inspect recursively.

Returns
-------
CheckResult
Passes when at least one supported code or notebook file is found
outside ignored directories.
"""
patterns = [
"**/*.py",
"**/*.R",
Expand Down Expand Up @@ -240,6 +312,20 @@ of things like READMEs, notebooks, or markdown files:

```{python}
def check_narrative_exists(directory: Path) -> CheckResult:
"""
Check whether the project contains narrated analysis notebooks.

Parameters
----------
directory : Path
Project directory to inspect recursively.

Returns
-------
CheckResult
Passes when Quarto, R Markdown, or Jupyter notebooks are present
outside ignored directories.
"""

patterns = [
"**/*.qmd",
Expand Down Expand Up @@ -285,6 +371,20 @@ def check_narrative_exists(directory: Path) -> CheckResult:
)

def check_readme_exists(directory: Path) -> CheckResult:
"""
Check whether the project contains a top-level README document.

Parameters
----------
directory : Path
Project directory to inspect.

Returns
-------
CheckResult
Passes when a supported README filename exists and errors
otherwise.
"""
if (directory / "README.md").exists() or (directory / "README.Rmd").exists() or (directory / "README.qmd").exists() or (directory / "README").exists():
return CheckResult(
name="readme_exists",
Expand Down Expand Up @@ -312,6 +412,20 @@ or a Python project structure with a `src` directory.
#| sorting-hat: keep
#|
def check_project_structure(directory: Path) -> CheckResult:
"""
Check for a minimal project scaffold.

Parameters
----------
directory : Path
Project directory to inspect.

Returns
-------
CheckResult
Passes when the project contains either an R project file or a
Python-style ``src`` directory.
"""
has_rproj = (directory / "project.Rproj").exists()
has_src = (directory / "src").exists() and (directory / "src").is_dir()

Expand All @@ -336,4 +450,3 @@ def check_project_structure(directory: Path) -> CheckResult:

In the next module, we define the ManifestChecker class, which will run all of these
checks and more, and return a structured report that can be displayed in `stagecoach inspect`.

Loading
Loading