This repository is the PixArt-alpha-only split used for our project: an empirical comparison of integer PTQ and low-bit floating-point PTQ for diffusion transformers.
We evaluate whether low-precision floating-point PTQ preserves DiT generation quality better than integer PTQ under equal bit budgets:
W4A6W4A8
Target models and data:
- PixArt-alpha
- MS-COCO prompts/images for calibration and evaluation
- Q-DiT: integer PTQ baseline with group quantization and dynamic quantization.
- FP4DiT: floating-point PTQ with module-aware precision and scale-aware rounding.
Completed on Q-DiT side:
- Calibration data collection pipeline
- Quantization and generation pipeline
- Evaluation pipeline for IS/FID/sFID/Precision/Recall
Use requirements-uv.txt (recommended for this split):
uv venv -p 3.9
source .venv/bin/activate
uv pip install --index-url https://download.pytorch.org/whl/cu117 torch==1.13.1+cu117 torchvision==0.14.1+cu117 torchaudio==0.13.1
uv pip install --index-strategy unsafe-best-match -r requirements-uv.txtCalibration script expects COCO at:
- default:
~/datasets/coco - override:
COCO_ROOT=/path/to/coco
Expected files:
train2017/annotations/captions_train2017.json
Download commands (matches the hardcoded layout used by scripts/pixart_alpha_calib.py):
export COCO_ROOT=~/datasets/coco
mkdir -p "$COCO_ROOT"
cd "$COCO_ROOT"
wget -c http://images.cocodataset.org/zips/train2017.zip
wget -c http://images.cocodataset.org/annotations/annotations_trainval2017.zip
unzip -q train2017.zip
unzip -q annotations_trainval2017.zipIf you run generation with --coco_10k or --coco_9k, this repo also expects:
captions/captions_val2017.json
Create it from COCO annotations:
cd /path/to/DIT-PTQ
mkdir -p captions
cp "$COCO_ROOT/annotations/captions_val2017.json" captions/captions_val2017.jsonpython scripts/pixart_alpha_calib.pyThis creates pixart_calib_brecq.pt.
python scripts/pixart_alpha_brecq.py --plms --cond --n_samples 1 --outdir <output_dir> --ptq --weight_bit 4 --quant_mode qdiff --cali_data_path pixart_calib_brecq.pt --cali_batch_size 16 --cali_iters 2500 --cali_iters_a 1 --quant_act --act_bit <6_or_8> --act_mantissa_bits <3_for_A6_or_4_for_A8> --weight_group_size 128 --weight_mantissa_bits 1 --ff_weight_mantissa 0 --res 512 --coco_10kpython scripts/pixart_alpha_brecq.py --plms --cond --n_samples 1 --outdir <output_dir> --ptq --weight_bit 4 --quant_mode qdiff --cali_data_path pixart_calib_brecq.pt --cali_batch_size 16 --cali_iters 2500 --cali_iters_a 1 --quant_act --act_bit <6_or_8> --act_mantissa_bits <3_for_A6_or_4_for_A8> --cali_ckpt <ckpt> --resume_w --weight_group_size 128 --weight_mantissa_bits 1 --ff_weight_mantissa 0 --res 512 --coco_10kAfter generation, run:
python scripts/eval_metrics.py \
--gen_dir <output_dir>/<run_timestamp>/samples_10k \
--real_dir "$COCO_ROOT/val2017" \
--captions_json captions/captions_val2017.json \
--caption_mode coco_10k \
--save_json <output_dir>/<run_timestamp>/metrics.jsonNotes:
--caption_mode coco_10kmatches generation with--coco_10k.- For
--coco_9k, use--caption_mode coco_9kandsamples_9k. - CLIP-score prompt source can also be
--prompt_file(one prompt/line) or--prompt(single prompt for all images). FIDusesclean-fidby default. To reproduce the previous repo-local implementation, pass--fid_backend custom.ImageReward_meanis computed against the same validation prompts used for generation unless--skip_imagerewardis passed.sFIDis hidden from the default path for now. If you explicitly want the repo-local experimental spatial metric, pass--compute_sfid.