ModTrainer Fine-Tuning How-To (Start to End)

This guide walks you through setting up the environment, preparing data, running LoRA fine-tuning, and validating outputs with the built-in policy checks.

1) What this repository does

This project fine-tunes an instruction model using LoRA adapters and then evaluates response quality with policy-oriented checks:

Training entrypoint: training/finetune_lora.py
Environment preflight check: check_env.py
Evaluation scripts: eval/run_policy_eval.py and eval/policy_checks.py

The default base model is mistralai/Mistral-7B-Instruct-v0.2, with automatic fallback to TinyLlama/TinyLlama-1.1B-Chat-v1.0 if loading fails.

2) Prerequisites

Python 3.10+ (Python 3.12 preferred when available; Python 3.13 is supported with caveats)
Optional but strongly recommended: NVIDIA GPU + CUDA for faster training
A Hugging Face account and token if using gated models

Install dependencies:

pip install -r requirements.txt
# requirements now target CUDA 12.4-compatible PyTorch wheels (cu124)

Note: requirements.txt pins package versions. If you deviate, run the preflight checker to confirm compatibility.

3) Configure environment variables

Copy .env.example to .env and set real values:

cp .env.example .env

Update at least:

HF_TOKEN (required for gated models)

Optional overrides available in .env.example:

MODEL_NAME
TRAIN_FILE
VAL_FILE
OUTPUT_DIR
SAVE_DIR
SEED
MAX_SEQ_LENGTH

Preflight now auto-loads .env if present, but loading it in your shell is still useful for training commands (example):

set -a
source .env
set +a

4) Run preflight checks (recommended)

Run:

python check_env.py --model-name "${MODEL_NAME:-mistralai/Mistral-7B-Instruct-v0.2}"

This verifies:

.env loading and HF token presence (or warns)
CUDA availability
PATH contamination for obvious model-path mistakes
Installed package versions
Model repository accessibility on Hugging Face
Tokenizer dependencies (protobuf, sentencepiece)

If you want CI/automation to fail on warnings:

python check_env.py --strict --model-name "${MODEL_NAME:-mistralai/Mistral-7B-Instruct-v0.2}"

5) Prepare training/validation data

Expected file format is JSONL with fields:

instruction
context
response

Each line should be one valid JSON object. Example:

{"instruction":"Explain custody controls","context":"Public treasury modernization","response":"Use lawful, sovereign controls with independent audit trails."}

By default, training uses:

data/train.jsonl
data/val.jsonl

You can override via CLI flags or environment variables.

6) Launch LoRA fine-tuning

Basic command:

python training/finetune_lora.py \
  --train-file "${TRAIN_FILE:-data/train.jsonl}" \
  --val-file "${VAL_FILE:-data/val.jsonl}" \
  --output-dir "${OUTPUT_DIR:-./govchain-model}" \
  --save-dir "${SAVE_DIR:-govchain-lora}" \
  --model-name "${MODEL_NAME:-mistralai/Mistral-7B-Instruct-v0.2}" \
  --seed "${SEED:-42}" \
  --max-seq-length "${MAX_SEQ_LENGTH:-1024}" \
  --hf-token "${HF_TOKEN}"

What happens during training:

Train/validation JSONL files are loaded.
Records are formatted into an instruction/context/response prompt template.
The script tries the selected base model; if that fails, it attempts TinyLlama fallback.
- Common typo protection: minstral/... and shorthand mistral-7b-instruct-v0.2 are automatically normalized to mistralai/Mistral-7B-Instruct-v0.2.
LoRA adapters are attached (q_proj, v_proj; rank 16; alpha 32).
SFT training runs and checkpoints are saved.
Final adapter model is saved in --save-dir.

Default training config in code:

Per-device batch size: 2
Gradient accumulation: 8
Learning rate: 2e-4
Epochs: 6
Eval/save interval: every 50 steps

7) Evaluate policy compliance

Run policy evaluation over one or more JSONL files with response fields:

python eval/run_policy_eval.py \
  --inputs data/val.jsonl data/govchain_redteam_500.jsonl \
  --max-errors 0

Behavior:

Prints totals and failure counts per file
Displays up to 5 sample failures
Exits non-zero if total failures exceed --max-errors

Current checks include:

Forbidden phrases (e.g., DeFi/yield-farming style terms)
Required concept coverage (public funds, sovereign, legal, audit)

Generate `outputs/generated_outputs.jsonl` from your adapter

After fine-tuning, create model responses first, then run eval against that JSONL:

python eval/generate_outputs.py \
  --model-name "${MODEL_NAME:-mistralai/Mistral-7B-Instruct-v0.2}" \
  --adapter-dir "${ADAPTER_DIR:-govchain-lora}" \
  --input "${INPUT_PATH:-data/val.jsonl}" \
  --output "${OUTPUT_PATH:-outputs/generated_outputs.jsonl}" \
  --hf-token "${HF_TOKEN}"

python eval/run_policy_eval.py --inputs outputs/generated_outputs.jsonl --max-errors 0

Notes:

eval/run_policy_eval.py reads existing JSONL files; it does not generate outputs itself.
eval/generate_outputs.py includes tokenizer fallback (use_fast=False) and sets an offload folder by default (./offload) to avoid ValueError: We need an offload_dir ... on constrained GPUs.
If GPU memory is still tight, force CPU loading/inference with --device-map cpu.

8) End-to-end quick run (copy/paste)

# 1) Install deps
pip install -r requirements.txt

# 2) Configure env
cp .env.example .env
# edit .env and set HF_TOKEN if needed
set -a && source .env && set +a

# 3) Preflight
python check_env.py --model-name "${MODEL_NAME:-mistralai/Mistral-7B-Instruct-v0.2}"

# 4) Train
python training/finetune_lora.py \
  --train-file "${TRAIN_FILE:-data/train.jsonl}" \
  --val-file "${VAL_FILE:-data/val.jsonl}" \
  --output-dir "${OUTPUT_DIR:-./govchain-model}" \
  --save-dir "${SAVE_DIR:-govchain-lora}" \
  --model-name "${MODEL_NAME:-mistralai/Mistral-7B-Instruct-v0.2}" \
  --seed "${SEED:-42}" \
  --max-seq-length "${MAX_SEQ_LENGTH:-1024}" \
  --hf-token "${HF_TOKEN}"

# 5) Evaluate
python eval/run_policy_eval.py --inputs data/val.jsonl data/govchain_redteam_500.jsonl --max-errors 0

9) Optional: run with Docker

Build:

docker build -t modtrainer .

Run (mount local repo + pass token):

docker run --gpus all --rm -it \
  -v "$(pwd):/app" \
  -e HF_TOKEN="$HF_TOKEN" \
  modtrainer

The image default command launches python3 training/finetune_lora.py.

10) Troubleshooting

Model download/auth error: ensure HF_TOKEN is set and accepted for the model repo.
Very slow training: verify CUDA is available (python check_env.py).
OOM errors: reduce --max-seq-length or tune batch/accumulation settings in training/finetune_lora.py.
Policy eval failing: inspect printed violations/missing concepts and adjust generated responses or dataset targets.
sentencepiece build error on Python 3.13 (cmake/pkg-config missing): ensure sentencepiece==0.2.1 is installed (already pinned here), or switch to a Python 3.12 virtualenv if your platform lacks wheels.
python3.12: command not found when creating a venv: use your available interpreter instead (python3 -m venv venv), then verify compatibility with python check_env.py --strict. Only switch interpreters if preflight reports blocking issues.
Host has CUDA 12.8 installed but repo requires CUDA 12.4 torch wheels: keep your NVIDIA driver, but make sure the Python wheel is torch==2.5.1+cu124 from this repo. Use a clean venv and run pip install --force-reinstall -r requirements.txt, then verify torch.version.cuda reports 12.4 via python check_env.py.
FutureWarning about resume_download from huggingface_hub.file_download: this is a harmless deprecation warning from a dependency stack mismatch. The training script now suppresses that specific warning, and you can permanently resolve it by upgrading to compatible transformers/huggingface_hub versions when convenient.
ValueError: We need an offload_dir ... while loading adapter: run inference with eval/generate_outputs.py (it configures --offload-dir ./offload by default), or pass --offload-dir <path> explicitly.
Terminal shows Python code mixed into your shell command (for example ending with else full_text.strip()): your paste accidentally included script lines. Re-run from a saved .py file instead of pasting partial blocks directly.

11) Output artifacts you should expect

Training outputs/checkpoints under --output-dir (default ./govchain-model)
Saved LoRA adapter under --save-dir (default govchain-lora)
Policy evaluation summary in terminal output with non-zero exit for threshold breaches

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
data		data
eval		eval
training		training
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
MODTRAINER_PREFLIGHT_TROUBLESHOOTING.md		MODTRAINER_PREFLIGHT_TROUBLESHOOTING.md
README.md		README.md
check_env.py		check_env.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ModTrainer Fine-Tuning How-To (Start to End)

1) What this repository does

2) Prerequisites

3) Configure environment variables

4) Run preflight checks (recommended)

5) Prepare training/validation data

6) Launch LoRA fine-tuning

7) Evaluate policy compliance

Generate `outputs/generated_outputs.jsonl` from your adapter

8) End-to-end quick run (copy/paste)

9) Optional: run with Docker

10) Troubleshooting

11) Output artifacts you should expect

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors 2

Uh oh!

Languages

greenarmor/modtrainer

Folders and files

Latest commit

History

Repository files navigation

ModTrainer Fine-Tuning How-To (Start to End)

1) What this repository does

2) Prerequisites

3) Configure environment variables

4) Run preflight checks (recommended)

5) Prepare training/validation data

6) Launch LoRA fine-tuning

7) Evaluate policy compliance

Generate outputs/generated_outputs.jsonl from your adapter

8) End-to-end quick run (copy/paste)

9) Optional: run with Docker

10) Troubleshooting

11) Output artifacts you should expect

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors 2

Uh oh!

Languages

Generate `outputs/generated_outputs.jsonl` from your adapter

Packages