closure is a machine learning framework for fluid closure modeling on ECsim and iPiC3D data.
The training stack is now based on PyTorch Lightning.
- Lightning-native training with clear separation between model and data logic.
- YAML-driven experiments through LightningCLI.
- Built-in callbacks for timing and memory monitoring.
- Evaluation and plotting helpers compatible with the new module/datamodule API.
closure/module.py:ClosureLitModule(lightning.LightningModule)closure/datamodule.py:ClosureDataModule(lightning.LightningDataModule)closure/models.py: network architectures (MLP,FCNN,ResNet,CNet)closure/cli.py: CLI entry point (closure-train)closure/eval_cli.py: run evaluation CLI (closure-eval)closure/callbacks.py:MemoryMonitorCallback,TimingCallback,TorchScriptCheckpointExportCallbackclosure/evaluation.py: post-training metrics and prediction transformsclosure/visualization.py: prediction vs ground-truth plotting
pip install -e .This installs the core framework with PyTorch, PyTorch Lightning, and essential utilities.
We provide several optional extras for different use cases:
Hyperparameter Optimization (Optuna)
For hyperparameter search with Optuna, install the hp extra:
pip install -e ".[hp]"Includes: optuna, optuna-integration, scikit-learn, plotly, nbformat
Jupyter Notebooks
For interactive notebook development:
pip install -e ".[notebook]"Includes: jupyter, ipykernel, notebook, ipywidgets
Combined Installation (HP + Notebooks)
pip install -e ".[hp,notebook]"Development
For development, testing, and linting:
pip install -e ".[dev]"Includes: pytest, pytest-cov, ruff, pre-commit
The package includes PyTorch, torchvision, and torchaudio but defaults to CPU builds. To enable GPU support, force-reinstall the PyTorch packages from the appropriate CUDA index (required because pip will otherwise skip the reinstall if versions match):
CUDA 12.4 (Recommended for driver ≥ 525.60):
pip install torch torchvision torchaudio --force-reinstall --index-url https://download.pytorch.org/whl/cu124CUDA 12.1:
pip install torch torchvision torchaudio --force-reinstall --index-url https://download.pytorch.org/whl/cu121CPU-only (no GPU):
pip install torch torchvision torchaudio --force-reinstall --index-url https://download.pytorch.org/whl/cpuNote: Check your NVIDIA driver version with
nvidia-smi. The driver's CUDA version must be ≥ the toolkit version. For example, driver CUDA 12.8 supports cu124 but not cu130.
Verify GPU support after installation:
import torch
print(f"CUDA available: {torch.cuda.is_available()}")
print(f"Device count: {torch.cuda.device_count()}")If you want to use the Optuna hyperparameter sweep functionality with GPU acceleration:
# Install core + hyperparameter optimization + notebooks
pip install -e ".[hp,notebook]"
# Then force-reinstall GPU-enabled PyTorch for your platform (e.g., CUDA 12.4)
pip install torch torchvision torchaudio --force-reinstall --index-url https://download.pytorch.org/whl/cu124We provide pre-made requirements files for common workflows:
Core only (CPU):
pip install -r requirements.txtHyperparameter optimization (Optuna + analysis):
pip install -r requirements-hp.txtDevelopment and testing:
pip install -r requirements-dev.txtGPU support with CUDA 12.4:
pip install -r requirements.txt
pip install torch torchvision torchaudio --force-reinstall --index-url https://download.pytorch.org/whl/cu124Full stack (HP + Notebooks + Dev — matches closure-test env):
pip install -r requirements-dev.txtFor GPU support, force-reinstall PyTorch from the appropriate CUDA index:
pip install torch torchvision torchaudio --force-reinstall --index-url https://download.pytorch.org/whl/cu124See requirements-gpu.txt for detailed instructions on GPU installation for different CUDA versions.
Test that everything is installed correctly:
# Test core imports
python -c "import closure; import lightning; import torch; print('✅ Core packages OK')"
# Test optional imports (if installed with [hp])
python -c "import optuna; import plotly; import sklearn; print('✅ HP packages OK')"
# Test notebook imports (if installed with [notebook])
python -c "import jupyter; import ipykernel; print('✅ Notebook packages OK')"
# Test GPU (if CUDA enabled)
python -c "import torch; print(f'CUDA available: {torch.cuda.is_available()}'); print(f'Device count: {torch.cuda.device_count()}')"
# Test CLI
closure-train --help
closure-eval --help
# Test Optuna sweep (hyperparameter optimization)
python examples/optuna/harris_optuna_sweep.py --helpimport lightning as L
from closure.datamodule import ClosureDataModule
from closure.models import MLP
from closure.module import ClosureLitModule
network = MLP(feature_dims=[10, 64, 32, 6], activations=["Tanh", "ReLU", None])
module = ClosureLitModule(
network=network,
criterion="MSELoss",
optimizer="Adam",
lr=5e-4,
scheduler="ReduceLROnPlateau",
)
datamodule = ClosureDataModule(
data_folder="/path/to/data",
norm_folder="/path/to/norm",
train_samples_file="/path/to/train.csv",
val_samples_file="/path/to/val.csv",
test_samples_file="/path/to/test.csv",
batch_size=512,
flatten=True,
read_features_targets_kwargs={
"request_features": ["rho_e", "Bx", "By", "Bz", "Vx_e", "Vy_e", "Vz_e", "Ex", "Ey", "Ez"],
"request_targets": ["Pxx_e", "Pyy_e", "Pzz_e", "Pxy_e", "Pxz_e", "Pyz_e"],
},
)
trainer = L.Trainer(max_epochs=50, accelerator="auto")
trainer.fit(module, datamodule=datamodule)
trainer.test(module, datamodule=datamodule)Use provided YAML configs under configs/.
closure-train fit --config configs/default.yamlOverride parameters directly from CLI:
closure-train fit \
--config configs/default.yaml \
--model.network.class_path=closure.models.ResNet \
--model.lr=1e-3 \
--data.batch_size=256closure-eval reproduces the common notebook evaluation workflow using
RunLoader and writes artifacts directly into the selected run/version folder
(or a custom output directory):
- prints config summary, history tail, best epoch, and test metrics to terminal
- writes per-channel test metrics CSV
- saves history and channel-metrics figures to
img/ - optionally renders per-target field plots (real/predict/error)
Quick tutorial:
# 1. Activate the project environment.
# For the HPC module-based workflow:
source activate_hpc.sh
# 2. Run evaluation on one saved run.
closure-eval --run-dir models/Lightning/iPiC3D-nathan5-12/test/run_1
# 3. Restrict to a few targets or samples when iterating on plots.
closure-eval \
--run-dir models/Lightning/iPiC3D-nathan5-12/test/run_1 \
--targets Pxx_e Pyy_e Pzz_e \
--max-plots 3
# 4. Reuse the trained model on a different test split.
closure-eval \
--run-dir models/Lightning/iPiC3D-nathan5-12/test/run_1 \
--test-samples-file ./splits/iPiC3D-nathan5-12/5-10-12/RunID_1.csv
# 5. Export only scalar reports when you do not want images.
closure-eval \
--run-dir models/Lightning/iPiC3D-nathan5-12/test/run_1 \
--skip-field-plotsUseful options:
--run-diror--version-dir: evaluate one explicit saved run--run-dir <parent_folder>: evaluate all direct child run folders in batch mode (unfinished runs are skipped)--log-root: automatically pick the latestrun_*orversion_*folder--targets: restrict field plots to selected target names--max-plots: limit how many time slices are rendered--test-samples-file: override the test set without editing config files--output-dir: write CSV/figures somewhere else--skip-history-plot,--skip-metrics-plot,--skip-field-plots: export only what you need
Examples:
# Evaluate one explicit run/version directory
closure-eval --run-dir models/Lightning/iPiC3D-nathan5-12/test/run_001
# Evaluate all runs under a parent folder (skips unfinished runs)
closure-eval --run-dir models/Lightning/iPiC3D-nathan5-12/ablations_long1000_serial/runs
# Or pick the latest run_*/version_* under a root directory
closure-eval --log-root models/Lightning/iPiC3D-nathan5-12/test
# Override the test split without editing config.yaml
closure-eval \
--run-dir models/Lightning/iPiC3D-nathan5-12/test/run_001 \
--test-samples-file ./splits/iPiC3D-nathan5-12/5-10-12/RunID_1.csv
# Only export metrics/history (no field plots)
closure-eval \
--run-dir models/Lightning/iPiC3D-nathan5-12/test/run_001 \
--skip-field-plotsDefault output layout:
<run_or_version_dir>/test_metrics.csv<run_or_version_dir>/img/history.png<run_or_version_dir>/img/channel_metrics.png<run_or_version_dir>/img/<target>_cycle<CYCLE>_{real,predict,error}.png<run_or_version_dir>/img/<target>_cycles<FIRST-LAST>_summary.png
## Logging and Artifacts
Lightning logging is used by default (CSV logger in configs).
`closure.log` is written alongside the Lightning CSV logger outputs. If you set
`--trainer.logger.init_args.name` and `--trainer.logger.init_args.version`, the
log file goes into that exact run directory. If you omit `version`, Lightning's
auto-created `version_*` directory is used, so `closure.log` lives inside the
same per-run folder as `metrics.csv`.
Typical outputs include:
- `lightning_logs/` or configured logger directory
- `metrics.csv`
- checkpoints from `ModelCheckpoint`
- matching TorchScript exports beside each checkpoint, e.g. `checkpoints/best-epoch=3-val_loss=0.1234.pt`
- normalized feature/target statistics in `norm_folder`
Legacy files like `loss_dict.pkl` are no longer used.
## Production Setup
This section covers everything needed to go from raw simulation data to
production training runs.
### 1. `paths.yaml`
Create a `paths.yaml` in the repository root (copy from `paths.yaml.example`):
```yaml
work_dir: ./models # training outputs, checkpoints, normalization stats
data_dir: /scratch/data # root of your simulation data
Relative paths in paths.yaml are resolved against the directory that
contains the file. All config parameters that accept paths use a three-tier
resolution strategy (implemented by ClosureDataModule._resolve_path):
| Path form | Example | Resolution |
|---|---|---|
| Absolute | /scratch/data/Harris |
Used as-is |
Dot-relative (./, ../) |
./data/train.csv |
Resolved against the current working directory |
| Bare identifier | ecsim/Harris/Le |
Joined with the corresponding paths.yaml root (data_dir or work_dir) |
Simulation data is stored as HDF5 or pickle files under data_dir, organized
by experiment. Each file contains a single simulation time step:
data_dir/
ecsim/Harris/Le/
T2D14_filter2/
T2D-Fields_00500.h5.pkl
T2D-Fields_01000.h5.pkl
...
T2D15_filter2/
T2D-Fields_00500.h5.pkl
...
The files are read by closure.read_pic.read_features_targets, which
extracts the requested field channels (B, E, rho, J, P, etc.) and species.
Use scripts/datasplit.py to build CSV split files. Each CSV has a single
filenames column listing the data file paths:
# Training set from two simulation folders (time steps 5000–10000)
python scripts/datasplit.py \
folders=[T2D14_filter2,T2D15_filter2] \
name=train.csv \
root_folder=/scratch/data/ecsim/Harris/Le/ \
min_number=5000 max_number=10000
# Validation set from a held-out folder
python scripts/datasplit.py \
folders=[T2D16_filter2] \
name=val.csv \
root_folder=/scratch/data/ecsim/Harris/Le/
# Test set
python scripts/datasplit.py \
folders=[T2D17_filter2] \
name=test.csv \
root_folder=/scratch/data/ecsim/Harris/Le/Arguments:
| Argument | Required | Description |
|---|---|---|
folders |
yes | Folder names or paths to search, e.g. [a,b,c] |
name |
yes | Output CSV filename |
root_folder |
no | Root prepended to each folder path |
pattern |
no | Glob pattern (default: T2D-Fields_*) |
min_number |
no | Exclude files with time-step number below this |
max_number |
no | Exclude files with time-step number above this |
Three annotated templates are provided under configs/:
| Template | Architecture | Data shape | Use case |
|---|---|---|---|
configs/default.yaml |
FCNN | 2-D patches | CNN-based closure |
configs/mlp.yaml |
MLP | Flattened pixels | Pixel-wise baseline |
configs/resnet.yaml |
ResNet | 2-D patches | Deep residual closure |
Copy one and customize. Key sections explained:
data:
data_folder: ecsim/Harris/Le # bare → joined with data_dir
norm_folder: Harris/Le/my_experiment # bare → joined with work_dir
train_samples_file: ./splits/train.csv # ./ → CWD-relative
val_samples_file: ./splits/val.csv
test_samples_file: ./splits/test.csv
flatten: false # true for MLP, false for CNN/ResNet
patch_dim: [32, 32] # random crop size (CNN/ResNet only)
scaler_features: true # enable per-channel standardization
scaler_targets: true
prescaler_features: # per-channel transforms before standardization
- arcsinh # rho_e
- null # Bx (no prescaling)
- ...
prescaler_targets:
- log # Pxx_e (positive-definite diagonal)
- arcsinh # Pxy_e (signed off-diagonal)
- ...
read_features_targets_kwargs:
fields_to_read: # which HDF5 field groups to load
B: true
E: true
rho: true
J: true
P: true
PI: true
request_features: # specific channels extracted from fields
- rho_e
- Bx
- By
- Bz
- Jx_e
- Jy_e
- Jz_e
- Vx_e
- Vy_e
- Vz_e
request_targets:
- Pxx_e
- Pyy_e
- Pzz_e
- Pxy_e
- Pxz_e
- Pyz_e
choose_species: ['e', null] # electron species for multi-species data
choose_x: [0, 512] # spatial domain crop
choose_y: [175, 325]Prescaler guidance:
log— for strictly positive quantities (diagonal pressure)arcsinh— for quantities that can be negative or span orders of magnitudenull— no prescaling
Single GPU:
closure-train fit --config my_config.yamlMulti-GPU (DDP):
closure-train fit --config my_config.yaml \
--trainer.devices=4 \
--trainer.strategy=ddpSlurm cluster:
#!/bin/bash
#SBATCH --nodes=2
#SBATCH --gres=gpu:4
#SBATCH --ntasks-per-node=4
#SBATCH --cpus-per-task=12
srun closure-train fit --config my_config.yamlFor systematic architecture/feature-set sweeps, use
scripts/scaffold_harris_experiments.py. It generates a directory tree
of YAML configs and Slurm run.sh scripts:
python scripts/scaffold_harris_experiments.py \
--output-root models/Harris/Le/Le2GEM15ppc_lightning \
--data-folder ecsim/Harris/Le \
--split-root ecsim/sampling/ecsim/Harris/Le/Le2GEM15ppc \
--max-epochs 500 --devices 4This creates:
Le2GEM15ppc_lightning/
default/P/ 4lrs_es500.yaml 5lrs_es500.yaml ... run.sh
default/divP/ 4lrs.yaml 5lrs.yaml ... run.sh
noE/P/ ...
noJ/P/ ...
noJnoE/P/ ...
Each variant (default, noE, noJ, noJnoE) uses a different feature subset.
Each task (P, divP) uses different targets and prescalers. The run.sh
files are ready to submit with sbatch.
After training, load a checkpoint and evaluate:
from closure.module import ClosureLitModule
from closure.evaluation import evaluate_loss, evaluate_regression_metrics, transform_targets
module = ClosureLitModule.load_from_checkpoint("best.ckpt", network=network)
ground_truth, prediction = transform_targets(module, test_dataset, ...)
# Per-channel MSE
evaluate_loss(test_dataset, ground_truth, prediction, "MSELoss", verbose=True)
# Regression metrics table (R², RMSE, Pearson r, etc.)
metrics_df = evaluate_regression_metrics(test_dataset, ground_truth, prediction)Export deployable artifacts:
import torch
# Inference bundle (state dict + normalization stats + metadata)
torch.save({"state_dict": ..., "features_mean": ..., ...}, "inference_bundle.pt")
# TorchScript for deployment
scripted = torch.jit.script(network)
scripted.save("torchscript.pt")See examples/tutorials/tuto_train.py for a complete end-to-end example
including evaluation, visualization, and artifact export.
examples/tutorials/tuto_train.py: self-contained training tutorial using bundled fixture dataexamples/tuto_train.ipynb: real-data tutorial (Lightning update section added at top)examples/tuto_train_synthetic.ipynb: synthetic-data tutorial (Lightning update section added at top)examples/optuna/optuna_sweep.py: Optuna sweep example with Lightningexamples/optuna/harris_optuna_sweep.py: Harris Le2GEM15ppc Optuna sweep for FCNN experiments
- The old
Trainer,PyNet, andclosure.trainersmodule were removed. - Use
ClosureLitModule+ClosureDataModulefor programmatic workflows. - Use
closure-trainfor config-driven workflows.
- Author: George Miloshevich
- License: MIT License
- Projects: STRIDE, HELIOSKILL
If you use closure in your research, please cite:
@article{miloshevich2026electron,
title = {Electron Neural Closure for Turbulent Magnetosheath Simulations: {{Energy}} Channels},
author = {Miloshevich, G. and Vranckx, L. and de Oliveira Lopes, F. N. and Dazzi, P. and Arrò, G. and Lapenta, G.},
year = {2026},
journal = {Physics of Plasmas},
volume = {33},
number = {1},
pages = {012901},
issn = {1070-664X},
doi = {10.1063/5.0300009},
}- examples/tuto_train.ipynb — Full tutorial notebook
- Source code docstrings for detailed API documentation
closure is designed for flexibility, reproducibility, and ease of use in scientific ML workflows. Contributions and feedback are welcome!
