FluxSR-style super-resolution pipeline for Z-Image Turbo:
- Stage 0 — generate offline
(eps, z0, x0)pairs and placeholder LR images - FTD Training — Flow Trajectory Distillation with LoRA on the Z-Image transformer
- SR inference — one-step or multi-step latent refinement from a degraded latent
- inspect tensor/image/model shapes for debugging
- upload/download dataset bundles with S3
- orchestrate runs with minimal ZenML wrappers
This repository is a research reproduction in progress.
- Current practical result from long Phase 2 runs: outputs are still mostly blurry.
- Possible causes: undertraining, objective weighting, or implementation mismatch vs paper details.
- No confirmed paper-quality reproduction yet due limited additional H100 budget.
- Keep this in mind when planning experiments and expectations.
- Multiple Phase 2 runs (from short to long schedules) improved color/structure but did not consistently recover sharp high-frequency detail.
- Some later checkpoints produced over-textured/noisy outputs instead of clean sharpening.
- Best qualitative checkpoints were often mid-run; final checkpoints were not reliably best.
- Current implementation should be treated as an experimental baseline, not a validated FluxSR reproduction.
uv sync
uv pip install -e .For FTD training, install the optional training dependencies:
uv pip install -e ".[training]"This adds peft (LoRA), lpips (perceptual loss), and wandb (native Phase 2 logging).
For RealESRGAN-style Phase 1 degradation, install:
uv pip install -e ".[degradation]"This adds basicsr and opencv-python.
zimagesr-data gather \
--model-id Tongyi-MAI/Z-Image-Turbo \
--out-dir ./zimage_offline_pairs \
--n 2400 \
--hr-size 1024 \
--debug \
--debug-every 1Resume example:
zimagesr-data gather --out-dir ./zimage_offline_pairs --start-index 285 --n 1200Generate placeholder LR/LR-up files:
zimagesr-data degrade --out-dir ./zimage_offline_pairs --n 1200Generate LR/LR-up files with RealESRGAN second-order degradation:
zimagesr-data degrade \
--out-dir ./zimage_offline_pairs \
--n 1200 \
--degradation realesrgan \
--seed 1234Inspect saved samples and metadata:
zimagesr-data inspect --out-dir ./zimage_offline_pairs --limit 5Main debug artifacts:
zimage_offline_pairs/metadata.jsonzimage_offline_pairs/debug_trace.jsonlzimage_offline_pairs/pairs/<sample_id>/eps.ptzimage_offline_pairs/pairs/<sample_id>/z0.ptzimage_offline_pairs/pairs/<sample_id>/x0.png
Before training, verify that VAE encode/decode preserves image quality.
If decoding z0.pt already looks soft compared to x0.png, that softness
is your quality ceiling — no amount of LoRA training will produce crisper results.
# Check first 10 samples
uv run zimagesr-data decode-check \
--pairs-dir ./zimage_offline_pairs/pairs \
--limit 10
# Check specific samples
uv run zimagesr-data decode-check \
--pairs-dir ./zimage_offline_pairs/pairs \
--ids 000000,000010,000020Each sample gets:
z0_decoded.png— VAE decode of the z0 latentz0_roundtrip_grid.png— side-by-side comparison:x0.png (original)vsz0 decoded (VAE round-trip)
Upload:
zimagesr-data s3-upload \
--out-dir ./zimage_offline_pairs \
--s3-uri s3://YOUR_BUCKET/zimagesr/run-001Download:
zimagesr-data s3-download \
--out-dir ./zimage_offline_pairs \
--s3-uri s3://YOUR_BUCKET/zimagesr/run-001FTD (Flow Trajectory Distillation) trains a LoRA adapter on the Z-Image transformer so it can predict clean latents from degraded ones in a single forward pass. Implements FluxSR paper Eq. 16/17 (FTD loss) and Eq. 18/21 (pixel reconstruction loss).
If your pairs directory does not yet contain zL.pt files (VAE-encoded LR images),
generate them first:
zimagesr-data generate-zl \
--out-dir ./zimage_offline_pairs \
--model-id Tongyi-MAI/Z-Image-Turbozimagesr-data train \
--pairs-dir ./zimage_offline_pairs/pairs \
--max-steps 750 \
--batch-size 4 \
--gradient-accumulation-steps 2 \
--learning-rate 5e-5 \
--tl 0.25 \
--lora-rank 16 \
--save-dir ./zimage_sr_lora_runs/ftd_run \
--save-every 150Key options:
| Flag | Default | Description |
|---|---|---|
--tl |
0.25 | Truncation level TL |
--rec-loss-every |
8 | Pixel recon loss frequency (0 to disable) |
--lambda-tvlpips |
1.0 | Weight for TV-LPIPS recon loss |
--lambda-z0 |
0.0 | Weight for latent endpoint loss SmoothL1(z0_hat, z0) |
--lambda-adl |
0.0 | ADL regularization weight (set > 0 to enable) |
--detach-recon / --no-detach-recon |
on | Gradient-free recon (saves VRAM) |
--gradient-checkpointing / --no-gradient-checkpointing |
on | Reduce VRAM at cost of speed |
--mixed-precision |
no | no, fp16, or bf16 |
--seed |
none | Reproducibility seed |
--save-dir |
./zimage_sr_lora_runs/ftd_run_<timestamp> |
Checkpoint output directory |
--wandb / --no-wandb |
off | Enable native WandB logging for training |
--wandb-project |
zimagesr | WandB project name |
--wandb-mode |
online | online or offline |
--wandb-log-checkpoints |
on | Log saved LoRA checkpoints as model artifacts |
--save-full-state / --no-save-full-state |
off | Save optimizer/scheduler state for resume (large files) |
--checkpoint-infer-grid / --no-checkpoint-infer-grid |
off | Save checkpoint-time inference grids (one-step + optional multi-step sweeps) |
--wandb-log-checkpoint-grids / --no-wandb-log-checkpoint-grids |
on | Log checkpoint inference grids as WandB images |
--checkpoint-eval-ids |
empty | Fixed pair IDs used for checkpoint grids (instead of random batch sample) |
--checkpoint-eval-images-dir |
none | Folder of arbitrary images to include in checkpoint grids |
--checkpoint-eval-images-limit |
4 | Max number of images loaded from --checkpoint-eval-images-dir |
--checkpoint-eval-input-upscale |
4.0 | Bicubic upscale factor before VAE encode for eval images |
--checkpoint-eval-fit-multiple |
16 | Resize eval images to a multiple before VAE encode |
--checkpoint-sr-scales |
1.3,1.6 | Extra sr_scale values rendered in checkpoint grids (empty to disable) |
--checkpoint-refine-steps |
empty | Extra multi-step refinement counts rendered in checkpoint grids (e.g. 4,8,16) |
--resume-from |
none | Resume from a checkpoint directory (auto-detects mode) |
Enable WandB example:
zimagesr-data train \
--pairs-dir ./zimage_offline_pairs/pairs \
--wandb \
--wandb-project zimagesr \
--wandb-run-name ftd-run-001Checkpoint inference-grid example:
zimagesr-data train \
--pairs-dir ./zimage_offline_pairs/pairs \
--save-every 500 \
--checkpoint-infer-grid \
--checkpoint-eval-ids 000000,000123,000777 \
--checkpoint-eval-images-dir ./eval_images \
--checkpoint-sr-scales 1.3,1.6 \
--checkpoint-refine-steps 4,8 \
--wandb \
--wandb-log-checkpoint-gridsEach checkpoint grid renders a baseline plus configured sweeps, e.g.:
LR | Base SR | LoRA SR (1.0, 1-step) | LoRA SR (1.3) | LoRA SR (1.6) | LoRA SR (4-step) | LoRA SR (8-step) | HR.
--checkpoint-sr-scales controls correction-strength sweep and --checkpoint-refine-steps
controls multi-step refinement sweep.
--checkpoint-infer-grid runs extra forward/decoder passes at each checkpoint step, so it adds runtime and some transient VRAM usage. Keep it off for max throughput.
If a run is interrupted, resume with --resume-from.
Resume mode is auto-detected from checkpoint contents:
Important: by default checkpoints are lightweight LoRA-only.
Use --save-full-state during training if you need seamless full-state resume.
# Full resume — restores optimizer state, RNG, and step counter
zimagesr-data train \
--pairs-dir ./zimage_offline_pairs/pairs \
--resume-from ./zimage_sr_lora_runs/ftd_run/lora_step_300 \
--save-dir ./zimage_sr_lora_runs/ftd_run \
--max-steps 750Two modes are auto-detected from the checkpoint directory contents:
| Mode | Detected when | Behaviour |
|---|---|---|
| Full | training_state.json + accelerator_state/ present |
Seamless resume: optimizer momentum, RNG, and step counter restored |
| Weights-only | Only adapter_config.json present |
Warm restart: LoRA weights loaded, fresh optimizer at step 0 |
For full resume, the trainer now validates key structural settings
(model_id, LoRA rank/alpha/dropout, and selected training structure flags)
against the checkpoint config before loading state, and raises a clear error on mismatch.
Weights-only mode is useful for resuming from older checkpoints (before this feature) or from checkpoints saved by other tools:
# Weights-only resume — loads LoRA weights, starts fresh optimizer at step 0
zimagesr-data train \
--pairs-dir ./zimage_offline_pairs/pairs \
--resume-from ./old_checkpoint_without_state \
--max-steps 750lora_step_300/
adapter_config.json # LoRA config (PEFT)
adapter_model.safetensors # LoRA weights
inference_grid_*.png # (optional) checkpoint inference previews (one per eval sample)
training_state.json # (--save-full-state) step counter + serialized config
accelerator_state/ # (--save-full-state) optimizer state, RNG, scheduler
By default only LoRA weights are saved (lightweight). Pass --save-full-state
to also save optimizer/scheduler state for seamless resume.
LoRA checkpoints are saved as PEFT adapters and remain directly usable for inference
even without the training state files.
Each sample directory under pairs/ should contain:
pairs/000000/
eps.pt # noise tensor (1, 16, 128, 128)
z0.pt # clean latent (1, 16, 128, 128)
zL.pt # degraded latent (1, 16, 128, 128)
x0.png # (optional) HR ground truth for recon loss
lr_up.png # (optional) upscaled LR image for zL generation
After training, run single-step super-resolution from Python:
import torch
from diffusers import ZImageImg2ImgPipeline, ZImageTransformer2DModel
from zimagesr.training.inference import one_step_sr
from zimagesr.training.lora import load_lora_for_inference
from zimagesr.training.transformer_utils import prepare_cap_feats
device, dtype = "cuda", torch.bfloat16
# Load pipeline (needed for VAE) and base transformer
pipe = ZImageImg2ImgPipeline.from_pretrained(
"Tongyi-MAI/Z-Image-Turbo", torch_dtype=dtype
).to(device)
base_tr = pipe.transformer
base_tr.requires_grad_(False)
# Load LoRA adapter
lora_tr = load_lora_for_inference(base_tr, "path/to/lora_final", device, dtype)
# Prepare null caption features
cap_feats = prepare_cap_feats(pipe, device, dtype) # (1, 2560)
# Load a degraded latent
zL = torch.load("path/to/pairs/000000/zL.pt", weights_only=True).to(device=device, dtype=dtype)
# Run one-step SR
pil_image = one_step_sr(
transformer=lora_tr,
vae=pipe.vae,
lr_latent=zL,
tl=0.25,
sr_scale=1.0,
refine_steps=1,
vae_sf=0.3611,
cap_feats_2d=cap_feats,
)
pil_image.save("sr_output.png")CLI from existing pair directory (zL.pt):
zimagesr-data infer \
--model-id Tongyi-MAI/Z-Image-Turbo \
--lora-path ./zimage_sr_lora_runs/ftd_run/lora_final \
--pair-dir ./zimage_offline_pairs/pairs/000000 \
--sr-scale 1.0,1.2 \
--refine-steps 1,8 \
--compare-grid \
--output ./sr_from_pair.pngCLI from arbitrary input image:
zimagesr-data infer \
--model-id Tongyi-MAI/Z-Image-Turbo \
--lora-path ./zimage_sr_lora_runs/ftd_run/lora_final \
--input-image ./my_input.png \
--input-upscale 4.0 \
--fit-multiple 16 \
--sr-scale 0.9 \
--refine-steps 1,8 \
--compare-grid \
--output ./sr_from_image.pngNotes for infer:
- In
--pair-dirmode, the command loadszL.ptdirectly (paper notation). - In
--input-imagemode, the image is RGB-converted, optionally bicubic-upscaled (--input-upscale), then resized to dimensions divisible by--fit-multiplebefore VAE encoding. - Set
--input-upscale 1.0if your input is already in the intended pre-upscaled space. --sr-scalecontrols correction strength at inference (z0_hat = zL - sr_scale * v(TL) * TL).--refine-stepscontrols Euler refinement steps fromt=TLto0(1reproduces one-step inference).- Add
--compare-gridto save<output>_grid.pngwithLR (decoded) | Base SR | LoRA SR ...and optionalHR (ground truth)ifx0.pngexists in--pair-dir.
Run gather pipeline:
zimagesr-data zenml-run --mode gather --out-dir ./zimage_offline_pairs --n 1200Run gather pipeline and upload to S3:
zimagesr-data zenml-run \
--mode gather \
--out-dir ./zimage_offline_pairs \
--s3-uri s3://YOUR_BUCKET/zimagesr/run-001Run download pipeline:
zimagesr-data zenml-run \
--mode download \
--s3-uri s3://YOUR_BUCKET/zimagesr/run-001 \
--out-dir ./zimage_offline_pairsThe repository ships with zenml.yaml, used by the dedicated bootstrap helper.
- Edit
zenml.yaml:
- set
components.artifact_stores.s3.enabled: truefor S3 - set S3 path to your bucket
- set
components.experiment_tracker.wandb.enabled: truefor WandB - optionally set
entityfor WandB
- Export optional credentials:
export WANDB_API_KEY=...
export AWS_PROFILE=...- Bootstrap ZenML components/stacks:
zimagesr-zenml-bootstrap --config zenml.yamlDry-run preview:
zimagesr-zenml-bootstrap --config zenml.yaml --dry-runOverride activated stack:
zimagesr-zenml-bootstrap --config zenml.yaml --activate-stack zimagesr-s3-stack- The gather pipeline requires a Z-Image pipeline variant that accepts
latents=. - FTD training requires a GPU with sufficient VRAM (tested on 40 GB A100). Reduce
--batch-sizeand increase--gradient-accumulation-stepsfor smaller GPUs. torch/CUDA installation is environment-specific; install the wheel that matches your CUDA runtime.- S3 sync uses
boto3default credential chain. peftandlpipsare only needed for training and are not required for data generation or S3 sync.
If you use this repository, cite both the implementation and the upstream papers.
@software{zimagesr_2026,
title = {ZImageSR: FluxSR-style Super-Resolution Pipeline for Z-Image Turbo},
author = {Krzysztof Gonia},
year = {2026},
note = {Local project repository}
}
@article{li2025fluxsr,
title = {One Diffusion Step to Real-World Super-Resolution via Flow Trajectory Distillation},
author = {Li, Jianze and Cao, Jiezhang and Guo, Yong and Li, Wenbo and Zhang, Yulun},
journal = {arXiv preprint arXiv:2502.01993},
year = {2025}
}
@article{cai2025zimage,
title = {Z-Image: An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer},
author = {Cai, Huanqia and Cao, Sihan and Du, Ruoyi and Gao, Peng and Hoi, Steven and others},
journal = {arXiv preprint arXiv:2511.22699},
year = {2025}
}