Code and pretrained models, as described in the paper: Progressive Checkerboards for Autoregressive Multiscale Image Generation.
This codebase is based on a combination of several prior codebases including:
Pretrained models are available on HuggingFace at deigen/checkerboardgen.
To download them, you can use the download.sh script:
bash pretrained_models/download.shAnd also available at the following links:
| Model | Google Drive | HuggingFace |
|---|---|---|
| Autoencoder | gdrive | hf |
| Checkerboard-L-2x | gdrive | hf |
| Checkerboard-L-4x | gdrive | hf |
To download by hand, create the following directories:
mkdir -p pretrained_models/autoencoder
mkdir -p pretrained_models/checkerboard-L-2x
mkdir -p pretrained_models/checkerboard-L-4xThen download the config files and checkpoints from the links above into their respective directories.
To generate samples using the pretrained models, run the following command:
python main_sample.py --checkpoint pretrained_models/checkerboard-L-2x/checkpoint-last.pth --cfg 1.5 --steps_per_scale 4To train a new model, use the following command:
python main.py --config configs/config_256_S.yamlThis assumes you have imagenet downloaded and linked to data/imagenet.
Adjust the config file as needed for different datasets or model configurations.
To speed up training, you can create a cached version of the training data, using
python main.py --config configs/config_256_S.yaml --create_cache --batch_size 32and then use it with the --use_cache flag during training.
This creates a cache with all of the scales used for experiments in the paper. To only cache scales used in one model config, see the code in main.py.
There are two evaluation methods used in this repo.
torch_fidelity is used for inline evlauations during training.
For final evaluations, we use the original tensorflow implementations, which produce slightly different numbers, but are more comparable to prior work. These are in evaluator.py and run_evaluator.sh. The reference set used for these evals is at reference set.
The autoencoder used in this work is a requantized version of the LlamaGen autoencoder.
The requantization code is in train_ae_codebook/. The code here is somewhat complex
for what it does, but included for completeness. This code retrains just the quantization
layer as described in our paper, using the pretrained weights from LlamaGen
link page.
- We use the standard reference set for FID evals, not the full ImageNet validation set as mentioned in the paper (v1).
If you find this code useful for your research, please consider citing the following paper:
@article{progcheck2026,
title={Progressive Checkerboards for Autoregressive Multiscale Image Generation},
author={David Eigen},
journal={arXiv preprint arXiv:2602.03811},
year={2026}
}
