closure

closure is a machine learning framework for fluid closure modeling on ECsim and iPiC3D data.

The training stack is now based on PyTorch Lightning.

Highlights

Lightning-native training with clear separation between model and data logic.
YAML-driven experiments through LightningCLI.
Built-in callbacks for timing and memory monitoring.
Evaluation and plotting helpers compatible with the new module/datamodule API.

Core Components

closure/module.py: ClosureLitModule (lightning.LightningModule)
closure/datamodule.py: ClosureDataModule (lightning.LightningDataModule)
closure/models.py: network architectures (MLP, FCNN, ResNet, CNet)
closure/cli.py: CLI entry point (closure-train)
closure/eval_cli.py: run evaluation CLI (closure-eval)
closure/callbacks.py: MemoryMonitorCallback, TimingCallback, TorchScriptCheckpointExportCallback
closure/evaluation.py: post-training metrics and prediction transforms
closure/visualization.py: prediction vs ground-truth plotting

Installation

Basic Installation

pip install -e .

This installs the core framework with PyTorch, PyTorch Lightning, and essential utilities.

Optional Dependencies

We provide several optional extras for different use cases:

Hyperparameter Optimization (Optuna)

For hyperparameter search with Optuna, install the hp extra:

pip install -e ".[hp]"

Includes: optuna, optuna-integration, scikit-learn, plotly, nbformat

Jupyter Notebooks

For interactive notebook development:

pip install -e ".[notebook]"

Includes: jupyter, ipykernel, notebook, ipywidgets

Combined Installation (HP + Notebooks)

pip install -e ".[hp,notebook]"

Development

For development, testing, and linting:

pip install -e ".[dev]"

Includes: pytest, pytest-cov, ruff, pre-commit

GPU/CUDA Support

The package includes PyTorch, torchvision, and torchaudio but defaults to CPU builds. To enable GPU support, force-reinstall the PyTorch packages from the appropriate CUDA index (required because pip will otherwise skip the reinstall if versions match):

CUDA 12.4 (Recommended for driver ≥ 525.60):

pip install torch torchvision torchaudio --force-reinstall --index-url https://download.pytorch.org/whl/cu124

CUDA 12.1:

pip install torch torchvision torchaudio --force-reinstall --index-url https://download.pytorch.org/whl/cu121

CPU-only (no GPU):

pip install torch torchvision torchaudio --force-reinstall --index-url https://download.pytorch.org/whl/cpu

Note: Check your NVIDIA driver version with nvidia-smi. The driver's CUDA version must be ≥ the toolkit version. For example, driver CUDA 12.8 supports cu124 but not cu130.

Verify GPU support after installation:

import torch
print(f"CUDA available: {torch.cuda.is_available()}")
print(f"Device count: {torch.cuda.device_count()}")

Recommended Installation for Hyperparameter Sweep Workflows

If you want to use the Optuna hyperparameter sweep functionality with GPU acceleration:

# Install core + hyperparameter optimization + notebooks
pip install -e ".[hp,notebook]"

# Then force-reinstall GPU-enabled PyTorch for your platform (e.g., CUDA 12.4)
pip install torch torchvision torchaudio --force-reinstall --index-url https://download.pytorch.org/whl/cu124

Quick Start with Requirements Files

We provide pre-made requirements files for common workflows:

Core only (CPU):

pip install -r requirements.txt

Hyperparameter optimization (Optuna + analysis):

pip install -r requirements-hp.txt

Development and testing:

pip install -r requirements-dev.txt

GPU support with CUDA 12.4:

pip install -r requirements.txt
pip install torch torchvision torchaudio --force-reinstall --index-url https://download.pytorch.org/whl/cu124

Full stack (HP + Notebooks + Dev — matches closure-test env):

pip install -r requirements-dev.txt

For GPU support, force-reinstall PyTorch from the appropriate CUDA index:

pip install torch torchvision torchaudio --force-reinstall --index-url https://download.pytorch.org/whl/cu124

See requirements-gpu.txt for detailed instructions on GPU installation for different CUDA versions.

Verifying Installation

Test that everything is installed correctly:

# Test core imports
python -c "import closure; import lightning; import torch; print('✅ Core packages OK')"

# Test optional imports (if installed with [hp])
python -c "import optuna; import plotly; import sklearn; print('✅ HP packages OK')"

# Test notebook imports (if installed with [notebook])
python -c "import jupyter; import ipykernel; print('✅ Notebook packages OK')"

# Test GPU (if CUDA enabled)
python -c "import torch; print(f'CUDA available: {torch.cuda.is_available()}'); print(f'Device count: {torch.cuda.device_count()}')"

# Test CLI
closure-train --help
closure-eval --help

# Test Optuna sweep (hyperparameter optimization)
python examples/optuna/harris_optuna_sweep.py --help

Quick Start (Python API)

import lightning as L

from closure.datamodule import ClosureDataModule
from closure.models import MLP
from closure.module import ClosureLitModule

network = MLP(feature_dims=[10, 64, 32, 6], activations=["Tanh", "ReLU", None])

module = ClosureLitModule(
    network=network,
    criterion="MSELoss",
    optimizer="Adam",
    lr=5e-4,
    scheduler="ReduceLROnPlateau",
)

datamodule = ClosureDataModule(
    data_folder="/path/to/data",
    norm_folder="/path/to/norm",
    train_samples_file="/path/to/train.csv",
    val_samples_file="/path/to/val.csv",
    test_samples_file="/path/to/test.csv",
    batch_size=512,
    flatten=True,
    read_features_targets_kwargs={
        "request_features": ["rho_e", "Bx", "By", "Bz", "Vx_e", "Vy_e", "Vz_e", "Ex", "Ey", "Ez"],
        "request_targets": ["Pxx_e", "Pyy_e", "Pzz_e", "Pxy_e", "Pxz_e", "Pyz_e"],
    },
)

trainer = L.Trainer(max_epochs=50, accelerator="auto")
trainer.fit(module, datamodule=datamodule)
trainer.test(module, datamodule=datamodule)

Quick Start (CLI)

Use provided YAML configs under configs/.

closure-train fit --config configs/default.yaml

Override parameters directly from CLI:

closure-train fit \
  --config configs/default.yaml \
  --model.network.class_path=closure.models.ResNet \
  --model.lr=1e-3 \
  --data.batch_size=256

Evaluate a trained run from CLI

closure-eval reproduces the common notebook evaluation workflow using RunLoader and writes artifacts directly into the selected run/version folder (or a custom output directory):

prints config summary, history tail, best epoch, and test metrics to terminal
writes per-channel test metrics CSV
saves history and channel-metrics figures to img/
optionally renders per-target field plots (real/predict/error)

Quick tutorial:

# 1. Activate the project environment.
# For the HPC module-based workflow:
source activate_hpc.sh

# 2. Run evaluation on one saved run.
closure-eval --run-dir models/Lightning/iPiC3D-nathan5-12/test/run_1

# 3. Restrict to a few targets or samples when iterating on plots.
closure-eval \
  --run-dir models/Lightning/iPiC3D-nathan5-12/test/run_1 \
  --targets Pxx_e Pyy_e Pzz_e \
  --max-plots 3

# 4. Reuse the trained model on a different test split.
closure-eval \
  --run-dir models/Lightning/iPiC3D-nathan5-12/test/run_1 \
  --test-samples-file ./splits/iPiC3D-nathan5-12/5-10-12/RunID_1.csv

# 5. Export only scalar reports when you do not want images.
closure-eval \
  --run-dir models/Lightning/iPiC3D-nathan5-12/test/run_1 \
  --skip-field-plots

Useful options:

--run-dir or --version-dir: evaluate one explicit saved run
--run-dir <parent_folder>: evaluate all direct child run folders in batch mode (unfinished runs are skipped)
--log-root: automatically pick the latest run_* or version_* folder
--targets: restrict field plots to selected target names
--max-plots: limit how many time slices are rendered
--test-samples-file: override the test set without editing config files
--output-dir: write CSV/figures somewhere else
--skip-history-plot, --skip-metrics-plot, --skip-field-plots: export only what you need

Examples:

# Evaluate one explicit run/version directory
closure-eval --run-dir models/Lightning/iPiC3D-nathan5-12/test/run_001

# Evaluate all runs under a parent folder (skips unfinished runs)
closure-eval --run-dir models/Lightning/iPiC3D-nathan5-12/ablations_long1000_serial/runs

# Or pick the latest run_*/version_* under a root directory
closure-eval --log-root models/Lightning/iPiC3D-nathan5-12/test

# Override the test split without editing config.yaml
closure-eval \
  --run-dir models/Lightning/iPiC3D-nathan5-12/test/run_001 \
  --test-samples-file ./splits/iPiC3D-nathan5-12/5-10-12/RunID_1.csv

# Only export metrics/history (no field plots)
closure-eval \
  --run-dir models/Lightning/iPiC3D-nathan5-12/test/run_001 \
  --skip-field-plots

Default output layout:

<run_or_version_dir>/test_metrics.csv
<run_or_version_dir>/img/history.png
<run_or_version_dir>/img/channel_metrics.png
<run_or_version_dir>/img/<target>_cycle<CYCLE>_{real,predict,error}.png
<run_or_version_dir>/img/<target>_cycles<FIRST-LAST>_summary.png


## Logging and Artifacts

Lightning logging is used by default (CSV logger in configs).

`closure.log` is written alongside the Lightning CSV logger outputs. If you set
`--trainer.logger.init_args.name` and `--trainer.logger.init_args.version`, the
log file goes into that exact run directory. If you omit `version`, Lightning's
auto-created `version_*` directory is used, so `closure.log` lives inside the
same per-run folder as `metrics.csv`.

Typical outputs include:

- `lightning_logs/` or configured logger directory
- `metrics.csv`
- checkpoints from `ModelCheckpoint`
- matching TorchScript exports beside each checkpoint, e.g. `checkpoints/best-epoch=3-val_loss=0.1234.pt`
- normalized feature/target statistics in `norm_folder`

Legacy files like `loss_dict.pkl` are no longer used.

## Production Setup

This section covers everything needed to go from raw simulation data to
production training runs.

### 1. `paths.yaml`

Create a `paths.yaml` in the repository root (copy from `paths.yaml.example`):

```yaml
work_dir: ./models       # training outputs, checkpoints, normalization stats
data_dir: /scratch/data   # root of your simulation data

Relative paths in paths.yaml are resolved against the directory that contains the file. All config parameters that accept paths use a three-tier resolution strategy (implemented by ClosureDataModule._resolve_path):

Path form	Example	Resolution
Absolute	`/scratch/data/Harris`	Used as-is
Dot-relative (`./`, `../`)	`./data/train.csv`	Resolved against the current working directory
Bare identifier	`ecsim/Harris/Le`	Joined with the corresponding `paths.yaml` root (`data_dir` or `work_dir`)

2. Data directory structure

Simulation data is stored as HDF5 or pickle files under data_dir, organized by experiment. Each file contains a single simulation time step:

data_dir/
  ecsim/Harris/Le/
    T2D14_filter2/
      T2D-Fields_00500.h5.pkl
      T2D-Fields_01000.h5.pkl
      ...
    T2D15_filter2/
      T2D-Fields_00500.h5.pkl
      ...

The files are read by closure.read_pic.read_features_targets, which extracts the requested field channels (B, E, rho, J, P, etc.) and species.

3. Creating train/val/test splits

Use scripts/datasplit.py to build CSV split files. Each CSV has a single filenames column listing the data file paths:

# Training set from two simulation folders (time steps 5000–10000)
python scripts/datasplit.py \
    folders=[T2D14_filter2,T2D15_filter2] \
    name=train.csv \
    root_folder=/scratch/data/ecsim/Harris/Le/ \
    min_number=5000 max_number=10000

# Validation set from a held-out folder
python scripts/datasplit.py \
    folders=[T2D16_filter2] \
    name=val.csv \
    root_folder=/scratch/data/ecsim/Harris/Le/

# Test set
python scripts/datasplit.py \
    folders=[T2D17_filter2] \
    name=test.csv \
    root_folder=/scratch/data/ecsim/Harris/Le/

Arguments:

Argument	Required	Description
`folders`	yes	Folder names or paths to search, e.g. `[a,b,c]`
`name`	yes	Output CSV filename
`root_folder`	no	Root prepended to each folder path
`pattern`	no	Glob pattern (default: `T2D-Fields_*`)
`min_number`	no	Exclude files with time-step number below this
`max_number`	no	Exclude files with time-step number above this

4. Writing a YAML config

Three annotated templates are provided under configs/:

Template	Architecture	Data shape	Use case
`configs/default.yaml`	FCNN	2-D patches	CNN-based closure
`configs/mlp.yaml`	MLP	Flattened pixels	Pixel-wise baseline
`configs/resnet.yaml`	ResNet	2-D patches	Deep residual closure

Copy one and customize. Key sections explained:

data:
  data_folder: ecsim/Harris/Le           # bare → joined with data_dir
  norm_folder: Harris/Le/my_experiment   # bare → joined with work_dir
  train_samples_file: ./splits/train.csv  # ./ → CWD-relative
  val_samples_file: ./splits/val.csv
  test_samples_file: ./splits/test.csv
  flatten: false                          # true for MLP, false for CNN/ResNet
  patch_dim: [32, 32]                     # random crop size (CNN/ResNet only)
  scaler_features: true                   # enable per-channel standardization
  scaler_targets: true
  prescaler_features:                     # per-channel transforms before standardization
    - arcsinh    # rho_e
    - null       # Bx  (no prescaling)
    - ...
  prescaler_targets:
    - log        # Pxx_e (positive-definite diagonal)
    - arcsinh    # Pxy_e (signed off-diagonal)
    - ...
  read_features_targets_kwargs:
    fields_to_read:                       # which HDF5 field groups to load
      B: true
      E: true
      rho: true
      J: true
      P: true
      PI: true
    request_features:                     # specific channels extracted from fields
      - rho_e
      - Bx
      - By
      - Bz
      - Jx_e
      - Jy_e
      - Jz_e
      - Vx_e
      - Vy_e
      - Vz_e
    request_targets:
      - Pxx_e
      - Pyy_e
      - Pzz_e
      - Pxy_e
      - Pxz_e
      - Pyz_e
    choose_species: ['e', null]           # electron species for multi-species data
    choose_x: [0, 512]                    # spatial domain crop
    choose_y: [175, 325]

Prescaler guidance:

log — for strictly positive quantities (diagonal pressure)
arcsinh — for quantities that can be negative or span orders of magnitude
null — no prescaling

5. Launching training

Single GPU:

closure-train fit --config my_config.yaml

Multi-GPU (DDP):

closure-train fit --config my_config.yaml \
    --trainer.devices=4 \
    --trainer.strategy=ddp

Slurm cluster:

#!/bin/bash
#SBATCH --nodes=2
#SBATCH --gres=gpu:4
#SBATCH --ntasks-per-node=4
#SBATCH --cpus-per-task=12

srun closure-train fit --config my_config.yaml

6. Scaffolding experiment sweeps

For systematic architecture/feature-set sweeps, use scripts/scaffold_harris_experiments.py. It generates a directory tree of YAML configs and Slurm run.sh scripts:

python scripts/scaffold_harris_experiments.py \
    --output-root models/Harris/Le/Le2GEM15ppc_lightning \
    --data-folder ecsim/Harris/Le \
    --split-root ecsim/sampling/ecsim/Harris/Le/Le2GEM15ppc \
    --max-epochs 500 --devices 4

This creates:

Le2GEM15ppc_lightning/
  default/P/          4lrs_es500.yaml  5lrs_es500.yaml  ...  run.sh
  default/divP/       4lrs.yaml        5lrs.yaml        ...  run.sh
  noE/P/              ...
  noJ/P/              ...
  noJnoE/P/           ...

Each variant (default, noE, noJ, noJnoE) uses a different feature subset. Each task (P, divP) uses different targets and prescalers. The run.sh files are ready to submit with sbatch.

7. Evaluation and artifact export

After training, load a checkpoint and evaluate:

from closure.module import ClosureLitModule
from closure.evaluation import evaluate_loss, evaluate_regression_metrics, transform_targets

module = ClosureLitModule.load_from_checkpoint("best.ckpt", network=network)
ground_truth, prediction = transform_targets(module, test_dataset, ...)

# Per-channel MSE
evaluate_loss(test_dataset, ground_truth, prediction, "MSELoss", verbose=True)

# Regression metrics table (R², RMSE, Pearson r, etc.)
metrics_df = evaluate_regression_metrics(test_dataset, ground_truth, prediction)

Export deployable artifacts:

import torch

# Inference bundle (state dict + normalization stats + metadata)
torch.save({"state_dict": ..., "features_mean": ..., ...}, "inference_bundle.pt")

# TorchScript for deployment
scripted = torch.jit.script(network)
scripted.save("torchscript.pt")

See examples/tutorials/tuto_train.py for a complete end-to-end example including evaluation, visualization, and artifact export.

Examples

examples/tutorials/tuto_train.py: self-contained training tutorial using bundled fixture data
examples/tuto_train.ipynb: real-data tutorial (Lightning update section added at top)
examples/tuto_train_synthetic.ipynb: synthetic-data tutorial (Lightning update section added at top)
examples/optuna/optuna_sweep.py: Optuna sweep example with Lightning
examples/optuna/harris_optuna_sweep.py: Harris Le2GEM15ppc Optuna sweep for FCNN experiments

Notes on Migration

The old Trainer, PyNet, and closure.trainers module were removed.
Use ClosureLitModule + ClosureDataModule for programmatic workflows.
Use closure-train for config-driven workflows.

Citing & License

Author: George Miloshevich
License: MIT License
Projects: STRIDE, HELIOSKILL

If you use closure in your research, please cite:

@article{miloshevich2026electron,
  title = {Electron Neural Closure for Turbulent Magnetosheath Simulations: {{Energy}} Channels},
  author = {Miloshevich, G. and Vranckx, L. and de Oliveira Lopes, F. N. and Dazzi, P. and Arrò, G. and Lapenta, G.},
  year = {2026},
  journal = {Physics of Plasmas},
  volume = {33},
  number = {1},
  pages = {012901},
  issn = {1070-664X},
  doi = {10.1063/5.0300009},
}

Name		Name	Last commit message	Last commit date
Latest commit History 274 Commits
_shims		_shims
closure		closure
configs		configs
examples		examples
scripts		scripts
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
INSTALLATION.md		INSTALLATION.md
LICENSE		LICENSE
PoP2026.jpeg		PoP2026.jpeg
README.md		README.md
activate_hpc.sh		activate_hpc.sh
closure.yml		closure.yml
paths.yaml.example		paths.yaml.example
pyproject.toml		pyproject.toml
requirements-dev.txt		requirements-dev.txt
requirements-gpu.txt		requirements-gpu.txt
requirements-hp.txt		requirements-hp.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

closure

Highlights

Core Components

Installation

Basic Installation

Optional Dependencies

GPU/CUDA Support

Recommended Installation for Hyperparameter Sweep Workflows

Quick Start with Requirements Files

Verifying Installation

Quick Start (Python API)

Quick Start (CLI)

Evaluate a trained run from CLI

2. Data directory structure

3. Creating train/val/test splits

4. Writing a YAML config

5. Launching training

6. Scaffolding experiment sweeps

7. Evaluation and artifact export

Examples

Notes on Migration

Citing & License

Further Reading

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

closure

Highlights

Core Components

Installation

Basic Installation

Optional Dependencies

GPU/CUDA Support

Recommended Installation for Hyperparameter Sweep Workflows

Quick Start with Requirements Files

Verifying Installation

Quick Start (Python API)

Quick Start (CLI)

Evaluate a trained run from CLI

2. Data directory structure

3. Creating train/val/test splits

4. Writing a YAML config

5. Launching training

6. Scaffolding experiment sweeps

7. Evaluation and artifact export

Examples

Notes on Migration

Citing & License

Further Reading

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages