Skip to content

Releases: KempnerInstitute/KempnerForge

KempnerForge v0.1.0

20 Apr 22:27
37ca7e1

Choose a tag to compare

KempnerForge v0.1.0 - Initial Release

PyTorch-native codebase for training foundation models on AI clusters. Fault-tolerant, resilient, distributed, and agent-ready. Built for NeuroAI and general AI research, single GPU to multi-node SLURM, with maximum GPU utilization as a first-class goal.

Highlights

  • Decoder-only Transformer + MoE: GQA attention with RoPE, SwiGLU MLP, RMSNorm. MoE variants with softmax and sigmoid top-k routing (DeepSeek-V3 style aux-loss-free balancing), sequence-level auxiliary loss, per-expert gradient scaling, and adaptive bias schedule.
  • 4 optimizers, 6 LR schedulers: AdamW (fused), Lion, Muon (Newton-Schulz orthogonalization), Schedule-Free AdamW. Schedulers: cosine, linear, WSD, constant, REX, none.
  • Full parallelism stack: FSDP2 via fully_shard(), tensor parallelism, pipeline parallelism, expert parallelism for MoE, DeviceMesh composition, and FP8 mixed precision hooks.
  • Distributed checkpointing with exact resume: DCP sharded save and load, non-blocking async writes, auto-resume via the latest symlink. Full state restore including dataloader position and RNG.
  • SLURM-friendly and resilient: SIGTERM / SIGUSR1 handlers for preemption, SLURM requeue-aware via SLURM_RESTART_COUNT, NaN detection with configurable action (warn, skip, raise), GPU and NCCL liveness probes. Multi-node via srun direct with InfiniBand auto-detection.
  • Agent-ready plugin (v0.1): six skills (cluster-config, smoke-test, slurm-launch, explain-architecture, add-optimizer, component-gaps). Every skill gates on a check_env.py preflight that returns actionable fix lines instead of silent failure.

Other Changes

  • Benchmarks: bench_forward.py, bench_moe.py, bench_optimizer.py, plus MFU scaling (dense up to 70B, MoE up to 32 GPUs with EP) and packed-MoE benchmarks.
  • Documentation: Sphinx site under docs/ with architecture walkthrough, how-to guides, configuration reference, subsystem index, and a dedicated "Agent Integration" section.
  • Tests: 852 unit tests, integration tests for checkpoint round-trip and data resumption, multi-GPU distributed tests via torchrun, opt-in e2e tests (4 GPUs, up to 7B).

Install / Upgrade

git clone https://github.com/KempnerInstitute/KempnerForge.git
cd KempnerForge
uv sync

# Install the agent plugin inside your Claude session (optional):                                                                                                                                                                        

/plugin marketplace add /abs/path/to/KempnerForge                                                                                                                                                           
/plugin install kempnerforge@kempnerforge
/reload-plugins
/kempnerforge:cluster-config