This guide walks you through setting up the environment, preparing data, running LoRA fine-tuning, and validating outputs with the built-in policy checks.
This project fine-tunes an instruction model using LoRA adapters and then evaluates response quality with policy-oriented checks:
- Training entrypoint:
training/finetune_lora.py - Environment preflight check:
check_env.py - Evaluation scripts:
eval/run_policy_eval.pyandeval/policy_checks.py
The default base model is mistralai/Mistral-7B-Instruct-v0.2, with automatic fallback to TinyLlama/TinyLlama-1.1B-Chat-v1.0 if loading fails.
- Python 3.10+ (Python 3.12 preferred when available; Python 3.13 is supported with caveats)
- Optional but strongly recommended: NVIDIA GPU + CUDA for faster training
- A Hugging Face account and token if using gated models
Install dependencies:
pip install -r requirements.txt
# requirements now target CUDA 12.4-compatible PyTorch wheels (cu124)Note:
requirements.txtpins package versions. If you deviate, run the preflight checker to confirm compatibility.
Copy .env.example to .env and set real values:
cp .env.example .envUpdate at least:
HF_TOKEN(required for gated models)
Optional overrides available in .env.example:
MODEL_NAMETRAIN_FILEVAL_FILEOUTPUT_DIRSAVE_DIRSEEDMAX_SEQ_LENGTH
Preflight now auto-loads .env if present, but loading it in your shell is still useful for training commands (example):
set -a
source .env
set +aRun:
python check_env.py --model-name "${MODEL_NAME:-mistralai/Mistral-7B-Instruct-v0.2}"This verifies:
.envloading and HF token presence (or warns)- CUDA availability
- PATH contamination for obvious model-path mistakes
- Installed package versions
- Model repository accessibility on Hugging Face
- Tokenizer dependencies (
protobuf,sentencepiece)
If you want CI/automation to fail on warnings:
python check_env.py --strict --model-name "${MODEL_NAME:-mistralai/Mistral-7B-Instruct-v0.2}"Expected file format is JSONL with fields:
instructioncontextresponse
Each line should be one valid JSON object. Example:
{"instruction":"Explain custody controls","context":"Public treasury modernization","response":"Use lawful, sovereign controls with independent audit trails."}By default, training uses:
data/train.jsonldata/val.jsonl
You can override via CLI flags or environment variables.
Basic command:
python training/finetune_lora.py \
--train-file "${TRAIN_FILE:-data/train.jsonl}" \
--val-file "${VAL_FILE:-data/val.jsonl}" \
--output-dir "${OUTPUT_DIR:-./govchain-model}" \
--save-dir "${SAVE_DIR:-govchain-lora}" \
--model-name "${MODEL_NAME:-mistralai/Mistral-7B-Instruct-v0.2}" \
--seed "${SEED:-42}" \
--max-seq-length "${MAX_SEQ_LENGTH:-1024}" \
--hf-token "${HF_TOKEN}"What happens during training:
- Train/validation JSONL files are loaded.
- Records are formatted into an instruction/context/response prompt template.
- The script tries the selected base model; if that fails, it attempts
TinyLlamafallback.- Common typo protection:
minstral/...and shorthandmistral-7b-instruct-v0.2are automatically normalized tomistralai/Mistral-7B-Instruct-v0.2.
- Common typo protection:
- LoRA adapters are attached (
q_proj,v_proj; rank 16; alpha 32). - SFT training runs and checkpoints are saved.
- Final adapter model is saved in
--save-dir.
Default training config in code:
- Per-device batch size:
2 - Gradient accumulation:
8 - Learning rate:
2e-4 - Epochs:
6 - Eval/save interval: every
50steps
Run policy evaluation over one or more JSONL files with response fields:
python eval/run_policy_eval.py \
--inputs data/val.jsonl data/govchain_redteam_500.jsonl \
--max-errors 0Behavior:
- Prints totals and failure counts per file
- Displays up to 5 sample failures
- Exits non-zero if total failures exceed
--max-errors
Current checks include:
- Forbidden phrases (e.g., DeFi/yield-farming style terms)
- Required concept coverage (
public funds,sovereign,legal,audit)
After fine-tuning, create model responses first, then run eval against that JSONL:
python eval/generate_outputs.py \
--model-name "${MODEL_NAME:-mistralai/Mistral-7B-Instruct-v0.2}" \
--adapter-dir "${ADAPTER_DIR:-govchain-lora}" \
--input "${INPUT_PATH:-data/val.jsonl}" \
--output "${OUTPUT_PATH:-outputs/generated_outputs.jsonl}" \
--hf-token "${HF_TOKEN}"
python eval/run_policy_eval.py --inputs outputs/generated_outputs.jsonl --max-errors 0Notes:
eval/run_policy_eval.pyreads existing JSONL files; it does not generate outputs itself.eval/generate_outputs.pyincludes tokenizer fallback (use_fast=False) and sets an offload folder by default (./offload) to avoidValueError: We need an offload_dir ...on constrained GPUs.- If GPU memory is still tight, force CPU loading/inference with
--device-map cpu.
# 1) Install deps
pip install -r requirements.txt
# 2) Configure env
cp .env.example .env
# edit .env and set HF_TOKEN if needed
set -a && source .env && set +a
# 3) Preflight
python check_env.py --model-name "${MODEL_NAME:-mistralai/Mistral-7B-Instruct-v0.2}"
# 4) Train
python training/finetune_lora.py \
--train-file "${TRAIN_FILE:-data/train.jsonl}" \
--val-file "${VAL_FILE:-data/val.jsonl}" \
--output-dir "${OUTPUT_DIR:-./govchain-model}" \
--save-dir "${SAVE_DIR:-govchain-lora}" \
--model-name "${MODEL_NAME:-mistralai/Mistral-7B-Instruct-v0.2}" \
--seed "${SEED:-42}" \
--max-seq-length "${MAX_SEQ_LENGTH:-1024}" \
--hf-token "${HF_TOKEN}"
# 5) Evaluate
python eval/run_policy_eval.py --inputs data/val.jsonl data/govchain_redteam_500.jsonl --max-errors 0Build:
docker build -t modtrainer .Run (mount local repo + pass token):
docker run --gpus all --rm -it \
-v "$(pwd):/app" \
-e HF_TOKEN="$HF_TOKEN" \
modtrainerThe image default command launches python3 training/finetune_lora.py.
- Model download/auth error: ensure
HF_TOKENis set and accepted for the model repo. - Very slow training: verify CUDA is available (
python check_env.py). - OOM errors: reduce
--max-seq-lengthor tune batch/accumulation settings intraining/finetune_lora.py. - Policy eval failing: inspect printed violations/missing concepts and adjust generated responses or dataset targets.
sentencepiecebuild error on Python 3.13 (cmake/pkg-configmissing): ensuresentencepiece==0.2.1is installed (already pinned here), or switch to a Python 3.12 virtualenv if your platform lacks wheels.python3.12: command not foundwhen creating a venv: use your available interpreter instead (python3 -m venv venv), then verify compatibility withpython check_env.py --strict. Only switch interpreters if preflight reports blocking issues.- Host has CUDA 12.8 installed but repo requires CUDA 12.4 torch wheels: keep your NVIDIA driver, but make sure the Python wheel is
torch==2.5.1+cu124from this repo. Use a clean venv and runpip install --force-reinstall -r requirements.txt, then verifytorch.version.cudareports12.4viapython check_env.py. FutureWarningaboutresume_downloadfromhuggingface_hub.file_download: this is a harmless deprecation warning from a dependency stack mismatch. The training script now suppresses that specific warning, and you can permanently resolve it by upgrading to compatibletransformers/huggingface_hubversions when convenient.ValueError: We need an offload_dir ...while loading adapter: run inference witheval/generate_outputs.py(it configures--offload-dir ./offloadby default), or pass--offload-dir <path>explicitly.- Terminal shows Python code mixed into your shell command (for example ending with
else full_text.strip()): your paste accidentally included script lines. Re-run from a saved.pyfile instead of pasting partial blocks directly.
- Training outputs/checkpoints under
--output-dir(default./govchain-model) - Saved LoRA adapter under
--save-dir(defaultgovchain-lora) - Policy evaluation summary in terminal output with non-zero exit for threshold breaches