If you've found QDM useful for your research or projects, please show your support by ⭐ in this repo. Thanks!
News: This paper was accepted to CVPR 2026 Findings on March 18, 2026.
Deep learning-based super-resolution (SR) methods often perform pixel-wise computations uniformly across entire images, even in homogeneous regions where high-resolution refinement is redundant. We propose the Quadtree Diffusion Model (QDM), a region-adaptive diffusion framework that leverages a quadtree structure to selectively enhance detail-rich regions while reducing computations in homogeneous areas. By guiding the diffusion with a quadtree derived from the low-quality input, QDM identifies key regions—represented by leaf nodes—where fine detail is essential and applies minimal refinement elsewhere. This mask-guided, two-stream architecture adaptively balances quality and efficiency, producing high-fidelity outputs with low computational redundancy. Experiments demonstrate QDM’s effectiveness in high-resolution SR tasks across diverse image types, particularly in medical imaging (e.g., CT scans), where large homogeneous regions are prevalent. Furthermore, QDM outperforms or is comparable to state-of-the-art SR methods on standard benchmarks while significantly reducing computational costs, highlighting its efficiency and suitability for resource-limited environments.
- 2026.03.18: Paper accepted to CVPR 2026 Findings.
- 2025.11.18: Released a new arXiv version with tumor region reconstruction and real-world SR results. Refer to the paper for details. Use
print_roi_metrics.pyto replicate the tumor reconstruction results. Access results for all methods here. Updated real-world SR with Gaussian-weighted patch-level aggregation as per this reference inutils/util_image.py. - 2025.03.18: Release codes & pretrained checkpoints, and update README.
- 2025.03.14: Create this repo.
- More detail (See requirements.txt)
A suitable conda environment named
quadtree_diffusioncan be created and activated with:
conda create -n quadtree_diffusion python=3.10
conda activate quadtree_diffusion
pip install -r requirements.txt
- Real-world SR Task: Download Link
- Medical SR Task: Download Link
Note: Place the downloaded models in the weights directory.
We provide pretrained checkpoints for the QDM-L model for the following tasks:
Note: Ensure all downloaded weights are placed in the weights directory.
If you have multiple GPUs available, you can accelerate the inference process using the following command:
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 torchrun --standalone --nproc_per_node=8 --nnodes=1 inference.py \
-i [Input Directory or Image] \
-o [Output Dir] \
--seed [Seed] \
--chop_bs [Chopping Batch Size] \
--chop_size [Chopping Size] \
--cfg_path [Config Path] \
--ckpt_path [Checkpoint Path] \
--distributedpython inference.py \
-i [Input Directory or Image] \
-o [Output Dir] \
--seed [Seed] \
--chop_bs [Chopping Batch Size] \
--chop_size [Chopping Size] \
--cfg_path [Config Path] \
--ckpt_path [Checkpoint Path]- When processing very large images, you can adjust
--chop_bsto balance efficiency and memory usage. - We provide multiple configuration files for different tasks in the
configs/inferencedirectory. Make sure to select the appropriate configuration file for your specific task. - You can add
--processargument to output the mask guided diffusion process demonstrated in the paper.
This repository supports two super-resolution (SR) tasks: Real-World SR and Medical CT SR. Follow the steps below to prepare the necessary training and testing datasets.
We integrate training data from six established benchmarks:
- LSDIR – Access Dataset
- DIV2K – Access Dataset
- DIV8K – Access Dataset
- OutdoorSceneTraining – Access Dataset
- Flicker2K – Access Dataset
- FFHQ Subset – A curated selection of 10,000 facial images from the FFHQ dataset
- Filtering OutdoorSceneTraining:
Filter out images with spatial dimensions smaller than 512 pixels. Update the directory path inside the script as needed, then run:python scripts/filter_images.py
- Synthetic LSDIR_TEST:
Download the pre-synthesized LSDIR_TEST dataset from this link or generate your own by running:python scripts/prepare_lsdir_test.py
For the medical CT super-resolution task, we utilize clinical CT scans from two well-established segmentation challenges: HaN-Seg and SegRap2023. Download the datasets using the following links:
You can start your training process via running:
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 torchrun --standalone --nproc_per_node=8 --nnodes=1 main.py --cfg_path [Config Path] --save_dir [Logging Folder]We provide multiple configuration files for different tasks in the configs/train directory.
Please consider citing our paper in your publications if it helps. Here is the bibtex:
@misc{yang2025qdmquadtreebasedregionadaptivesparse,
title={QDM: Quadtree-Based Region-Adaptive Sparse Diffusion Models for Efficient Image Super-Resolution},
author={Donglin Yang and Paul Vicol and Xiaojuan Qi and Renjie Liao and Xiaofan Zhang},
year={2025},
eprint={2503.12015},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2503.12015},
}
This project is licensed under MIT License. Redistribution and use should follow this license.
This project is primarily based on ResShift and LDM. We also adopt Real-ESRGAN to synthesize the LR/HR pairs. We design QDM mainly based on DiT. Thanks for their awesome works.
If you have any questions, please feel free to contact me via ydlin718@gmail.com.










