This is the official code repository for the paper WIND: Weather Inverse Diffusion for Zero-Shot Atmospheric Modeling.
- March 12, 2026: Training code for 0.25° resolution added.
- Feb 9, 2026: Training code released.
- Feb 3, 2026: Paper preprint available on arXiv.
WIND is a single pre-trained foundation model for weather and climate modeling that replaces specialized baselines across a wide range of tasks without any task-specific fine-tuning. It learns a task-agnostic prior of the atmosphere via a self-supervised video reconstruction objective using an unconditional video diffusion model. At inference, diverse domain-specific problems are framed as inverse problems and solved via posterior sampling.
Supported tasks include:
- Probabilistic ensemble forecasting
- Spatial and temporal downscaling
- Sparse reconstruction
- Enforcing conservation laws
- Counterfactual storylines of extreme weather events under global warming scenarios
- Python >= 3.11
- PyTorch >= 2.8.0
Clone the repository and install dependencies using uv:
git clone https://github.com/ml-jku/wind
cd wind
uv syncWIND is trained on ERA5 reanalysis data from the WeatherBench2 benchmark, stored in Zarr format and loaded lazily via xarray/dask.
Two spatial resolutions are supported:
| Resolution | Grid | GCS path |
|---|---|---|
| 1.5° (default) | 240 × 121 | gs://weatherbench2/datasets/era5/1959-2022-6h-240x121_equiangular_with_poles_conservative.zarr |
| 0.25° (high-res) | 1440 × 721 | gs://weatherbench2/datasets/era5/1959-2022-6h-1440x721.zarr |
Common properties:
| Property | Value |
|---|---|
| Time range | 1959–2022, 6-hourly |
| Channels | 70 (see below) |
| Format | Zarr |
Atmospheric fields (70 channels):
| Field | Levels | Channels |
|---|---|---|
| Temperature | 13 pressure levels | 13 |
| Geopotential | 13 pressure levels | 13 |
| Specific humidity | 13 pressure levels | 13 |
| u-component of wind | 13 pressure levels | 13 |
| v-component of wind | 13 pressure levels | 13 |
| 2m temperature | surface | 1 |
| Mean sea level pressure | surface | 1 |
| 10m u-component of wind | surface | 1 |
| 10m v-component of wind | surface | 1 |
| Total precipitation (6hr) | surface | 1 |
Static inputs (land-sea mask, soil type, geopotential at surface, and lat/lon encodings) are additionally provided to the model as conditioning.
The ERA5 data is available on Google Cloud Storage via WeatherBench2. You can either stream it directly from GCS or download it locally.
The dataloader uses xarray.open_zarr and supports gs:// paths natively via gcsfs. No download needed — data is streamed on the fly:
# Install gcsfs for GCS access
uv add gcsfsFor both resolutions the GCS path is the built-in fallback — no .env variable is required unless you want to override it with a local copy.
Public GCS access requires no authentication. For faster throughput, run from a GCP instance in us-central1.
Download the Zarr dataset from WeatherBench2 on GCS using gsutil:
# 1.5° (~100 GB)
gsutil -m cp -r gs://weatherbench2/datasets/era5/1959-2022-6h-240x121_equiangular_with_poles_conservative.zarr /path/to/local/
# 0.25° (~7 TB)
gsutil -m cp -r gs://weatherbench2/datasets/era5/1959-2022-6h-1440x721.zarr /path/to/local/Then point to the local paths via .env (see below).
Data paths (and other environment variables) are set in the .env file at the project root. Create it in the root folder and edit it:
# ERA5 data paths — local path or gs:// URI.
# If ERA5_1P5DEG_PATH is unset, falls back to the GCS URI in configs/data/era5_1p5deg.yaml.
# If ERA5_0P25DEG_PATH is unset, falls back to the GCS URI in configs/data/era5_0p25deg.yaml.
ERA5_1P5DEG_PATH=/path/to/1959-2022-6h-240x121_equiangular_with_poles_conservative.zarr
ERA5_0P25DEG_PATH=/path/to/1959-2022-6h-1440x721.zarrFor each variable, the datamodule uses the value if the path exists locally or starts with gs://, otherwise it falls back to the data_dir_global defined in the corresponding config.
Normalization statistics are precomputed and included in the repository at src/datasets/stats/.
Training is managed via Hydra and PyTorch Lightning.
# 1.5° resolution (default, ~240×121 grid)
uv run python src/train.py experiment=era5/train/era5_1p5deg
# 0.25° resolution (high-res, 1440×721 grid — requires more GPU memory)
uv run python src/train.py experiment=era5/train/era5_0p25degAlternatively, activate the virtual environment first and use python directly:
source .venv/bin/activate
python src/train.py experiment=era5/train/era5_1p5degA single sample at 0.25° is ~1.45 GB (5 timesteps × 70 channels × 1440 × 704 × float32). The default config uses batch_size=2 and num_workers=4 — scale down num_workers if host RAM is limited (~23 GB in the prefetch queue per GPU at defaults).
Hydra allows overriding any configuration parameter from the command line:
# Change batch size and learning rate
uv run python src/train.py experiment=era5/train/era5_1p5deg data.batch_size=16 model.optimizer.lr=1e-4
# Resume from a checkpoint
uv run python src/train.py experiment=era5/train/era5_1p5deg ckpt_path=/path/to/checkpoint.ckpt
# Disable Weights & Biases logging
uv run python src/train.py experiment=era5/train/era5_1p5deg logger=[]
# Run in debug mode
uv run python src/train.py experiment=era5/train/era5_1p5deg debug=defaultTraining is logged to Weights & Biases by default. Configure it in configs/logger/wandb.yaml or disable it with logger=[].
All environment variables — W&B credentials, data paths, and runtime settings — are configured in the .env file at the project root. A minimal setup looks like:
# W&B
WANDB_ENTITY=<your-wandb-entity>
WANDB_BASE_URL=https://api.wandb.ai
WANDB_IGNORE_GLOBS=*.log
# ERA5 data paths (local path or gs:// URI; omit to stream directly from GCS)
ERA5_1P5DEG_PATH=/path/to/1959-2022-6h-240x121_equiangular_with_poles_conservative.zarr
ERA5_0P25DEG_PATH=/path/to/1959-2022-6h-1440x721.zarrSet WANDB_ENTITY to your own W&B team or username.
If you like our work, please consider giving it a star 🌟 and cite us
@article{aich2026wind,
title={WIND: Weather Inverse Diffusion for Zero-Shot Atmospheric Modeling},
author={Michael Aich and Andreas Fürst and Florian Sestak and Carlos Ruiz-Gonzalez
and Niklas Boers and Johannes Brandstetter},
year={2026},
eprint={2602.03924},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2602.03924},
}