Skip to content

greenarmor/modtrainer

Repository files navigation

ModTrainer Fine-Tuning How-To (Start to End)

This guide walks you through setting up the environment, preparing data, running LoRA fine-tuning, and validating outputs with the built-in policy checks.

1) What this repository does

This project fine-tunes an instruction model using LoRA adapters and then evaluates response quality with policy-oriented checks:

  • Training entrypoint: training/finetune_lora.py
  • Environment preflight check: check_env.py
  • Evaluation scripts: eval/run_policy_eval.py and eval/policy_checks.py

The default base model is mistralai/Mistral-7B-Instruct-v0.2, with automatic fallback to TinyLlama/TinyLlama-1.1B-Chat-v1.0 if loading fails.


2) Prerequisites

  • Python 3.10+ (Python 3.12 preferred when available; Python 3.13 is supported with caveats)
  • Optional but strongly recommended: NVIDIA GPU + CUDA for faster training
  • A Hugging Face account and token if using gated models

Install dependencies:

pip install -r requirements.txt
# requirements now target CUDA 12.4-compatible PyTorch wheels (cu124)

Note: requirements.txt pins package versions. If you deviate, run the preflight checker to confirm compatibility.


3) Configure environment variables

Copy .env.example to .env and set real values:

cp .env.example .env

Update at least:

  • HF_TOKEN (required for gated models)

Optional overrides available in .env.example:

  • MODEL_NAME
  • TRAIN_FILE
  • VAL_FILE
  • OUTPUT_DIR
  • SAVE_DIR
  • SEED
  • MAX_SEQ_LENGTH

Preflight now auto-loads .env if present, but loading it in your shell is still useful for training commands (example):

set -a
source .env
set +a

4) Run preflight checks (recommended)

Run:

python check_env.py --model-name "${MODEL_NAME:-mistralai/Mistral-7B-Instruct-v0.2}"

This verifies:

  • .env loading and HF token presence (or warns)
  • CUDA availability
  • PATH contamination for obvious model-path mistakes
  • Installed package versions
  • Model repository accessibility on Hugging Face
  • Tokenizer dependencies (protobuf, sentencepiece)

If you want CI/automation to fail on warnings:

python check_env.py --strict --model-name "${MODEL_NAME:-mistralai/Mistral-7B-Instruct-v0.2}"

5) Prepare training/validation data

Expected file format is JSONL with fields:

  • instruction
  • context
  • response

Each line should be one valid JSON object. Example:

{"instruction":"Explain custody controls","context":"Public treasury modernization","response":"Use lawful, sovereign controls with independent audit trails."}

By default, training uses:

  • data/train.jsonl
  • data/val.jsonl

You can override via CLI flags or environment variables.


6) Launch LoRA fine-tuning

Basic command:

python training/finetune_lora.py \
  --train-file "${TRAIN_FILE:-data/train.jsonl}" \
  --val-file "${VAL_FILE:-data/val.jsonl}" \
  --output-dir "${OUTPUT_DIR:-./govchain-model}" \
  --save-dir "${SAVE_DIR:-govchain-lora}" \
  --model-name "${MODEL_NAME:-mistralai/Mistral-7B-Instruct-v0.2}" \
  --seed "${SEED:-42}" \
  --max-seq-length "${MAX_SEQ_LENGTH:-1024}" \
  --hf-token "${HF_TOKEN}"

What happens during training:

  1. Train/validation JSONL files are loaded.
  2. Records are formatted into an instruction/context/response prompt template.
  3. The script tries the selected base model; if that fails, it attempts TinyLlama fallback.
    • Common typo protection: minstral/... and shorthand mistral-7b-instruct-v0.2 are automatically normalized to mistralai/Mistral-7B-Instruct-v0.2.
  4. LoRA adapters are attached (q_proj, v_proj; rank 16; alpha 32).
  5. SFT training runs and checkpoints are saved.
  6. Final adapter model is saved in --save-dir.

Default training config in code:

  • Per-device batch size: 2
  • Gradient accumulation: 8
  • Learning rate: 2e-4
  • Epochs: 6
  • Eval/save interval: every 50 steps

7) Evaluate policy compliance

Run policy evaluation over one or more JSONL files with response fields:

python eval/run_policy_eval.py \
  --inputs data/val.jsonl data/govchain_redteam_500.jsonl \
  --max-errors 0

Behavior:

  • Prints totals and failure counts per file
  • Displays up to 5 sample failures
  • Exits non-zero if total failures exceed --max-errors

Current checks include:

  • Forbidden phrases (e.g., DeFi/yield-farming style terms)
  • Required concept coverage (public funds, sovereign, legal, audit)

Generate outputs/generated_outputs.jsonl from your adapter

After fine-tuning, create model responses first, then run eval against that JSONL:

python eval/generate_outputs.py \
  --model-name "${MODEL_NAME:-mistralai/Mistral-7B-Instruct-v0.2}" \
  --adapter-dir "${ADAPTER_DIR:-govchain-lora}" \
  --input "${INPUT_PATH:-data/val.jsonl}" \
  --output "${OUTPUT_PATH:-outputs/generated_outputs.jsonl}" \
  --hf-token "${HF_TOKEN}"

python eval/run_policy_eval.py --inputs outputs/generated_outputs.jsonl --max-errors 0

Notes:

  • eval/run_policy_eval.py reads existing JSONL files; it does not generate outputs itself.
  • eval/generate_outputs.py includes tokenizer fallback (use_fast=False) and sets an offload folder by default (./offload) to avoid ValueError: We need an offload_dir ... on constrained GPUs.
  • If GPU memory is still tight, force CPU loading/inference with --device-map cpu.

8) End-to-end quick run (copy/paste)

# 1) Install deps
pip install -r requirements.txt

# 2) Configure env
cp .env.example .env
# edit .env and set HF_TOKEN if needed
set -a && source .env && set +a

# 3) Preflight
python check_env.py --model-name "${MODEL_NAME:-mistralai/Mistral-7B-Instruct-v0.2}"

# 4) Train
python training/finetune_lora.py \
  --train-file "${TRAIN_FILE:-data/train.jsonl}" \
  --val-file "${VAL_FILE:-data/val.jsonl}" \
  --output-dir "${OUTPUT_DIR:-./govchain-model}" \
  --save-dir "${SAVE_DIR:-govchain-lora}" \
  --model-name "${MODEL_NAME:-mistralai/Mistral-7B-Instruct-v0.2}" \
  --seed "${SEED:-42}" \
  --max-seq-length "${MAX_SEQ_LENGTH:-1024}" \
  --hf-token "${HF_TOKEN}"

# 5) Evaluate
python eval/run_policy_eval.py --inputs data/val.jsonl data/govchain_redteam_500.jsonl --max-errors 0

9) Optional: run with Docker

Build:

docker build -t modtrainer .

Run (mount local repo + pass token):

docker run --gpus all --rm -it \
  -v "$(pwd):/app" \
  -e HF_TOKEN="$HF_TOKEN" \
  modtrainer

The image default command launches python3 training/finetune_lora.py.


10) Troubleshooting

  • Model download/auth error: ensure HF_TOKEN is set and accepted for the model repo.
  • Very slow training: verify CUDA is available (python check_env.py).
  • OOM errors: reduce --max-seq-length or tune batch/accumulation settings in training/finetune_lora.py.
  • Policy eval failing: inspect printed violations/missing concepts and adjust generated responses or dataset targets.
  • sentencepiece build error on Python 3.13 (cmake/pkg-config missing): ensure sentencepiece==0.2.1 is installed (already pinned here), or switch to a Python 3.12 virtualenv if your platform lacks wheels.
  • python3.12: command not found when creating a venv: use your available interpreter instead (python3 -m venv venv), then verify compatibility with python check_env.py --strict. Only switch interpreters if preflight reports blocking issues.
  • Host has CUDA 12.8 installed but repo requires CUDA 12.4 torch wheels: keep your NVIDIA driver, but make sure the Python wheel is torch==2.5.1+cu124 from this repo. Use a clean venv and run pip install --force-reinstall -r requirements.txt, then verify torch.version.cuda reports 12.4 via python check_env.py.
  • FutureWarning about resume_download from huggingface_hub.file_download: this is a harmless deprecation warning from a dependency stack mismatch. The training script now suppresses that specific warning, and you can permanently resolve it by upgrading to compatible transformers/huggingface_hub versions when convenient.
  • ValueError: We need an offload_dir ... while loading adapter: run inference with eval/generate_outputs.py (it configures --offload-dir ./offload by default), or pass --offload-dir <path> explicitly.
  • Terminal shows Python code mixed into your shell command (for example ending with else full_text.strip()): your paste accidentally included script lines. Re-run from a saved .py file instead of pasting partial blocks directly.

11) Output artifacts you should expect

  • Training outputs/checkpoints under --output-dir (default ./govchain-model)
  • Saved LoRA adapter under --save-dir (default govchain-lora)
  • Policy evaluation summary in terminal output with non-zero exit for threshold breaches

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors 2

  •  
  •