ZImageSR

FluxSR-style super-resolution pipeline for Z-Image Turbo:

Stage 0 — generate offline (eps, z0, x0) pairs and placeholder LR images
FTD Training — Flow Trajectory Distillation with LoRA on the Z-Image transformer
SR inference — one-step or multi-step latent refinement from a degraded latent
inspect tensor/image/model shapes for debugging
upload/download dataset bundles with S3
orchestrate runs with minimal ZenML wrappers

Current Status (Important)

This repository is a research reproduction in progress.

Current practical result from long Phase 2 runs: outputs are still mostly blurry.
Possible causes: undertraining, objective weighting, or implementation mismatch vs paper details.
No confirmed paper-quality reproduction yet due limited additional H100 budget.
Keep this in mind when planning experiments and expectations.

Known Limitations / Negative Results

Multiple Phase 2 runs (from short to long schedules) improved color/structure but did not consistently recover sharp high-frequency detail.
Some later checkpoints produced over-textured/noisy outputs instead of clean sharpening.
Best qualitative checkpoints were often mid-run; final checkpoints were not reliably best.
Current implementation should be treated as an experimental baseline, not a validated FluxSR reproduction.

1. Setup

uv sync
uv pip install -e .

For FTD training, install the optional training dependencies:

uv pip install -e ".[training]"

This adds peft (LoRA), lpips (perceptual loss), and wandb (native Phase 2 logging).

For RealESRGAN-style Phase 1 degradation, install:

uv pip install -e ".[degradation]"

This adds basicsr and opencv-python.

2. Generate Data (Phase 1)

zimagesr-data gather \
  --model-id Tongyi-MAI/Z-Image-Turbo \
  --out-dir ./zimage_offline_pairs \
  --n 2400 \
  --hr-size 1024 \
  --debug \
  --debug-every 1

Resume example:

zimagesr-data gather --out-dir ./zimage_offline_pairs --start-index 285 --n 1200

Generate placeholder LR/LR-up files:

zimagesr-data degrade --out-dir ./zimage_offline_pairs --n 1200

Generate LR/LR-up files with RealESRGAN second-order degradation:

zimagesr-data degrade \
  --out-dir ./zimage_offline_pairs \
  --n 1200 \
  --degradation realesrgan \
  --seed 1234

3. Debug and Inspect

Inspect saved samples and metadata:

zimagesr-data inspect --out-dir ./zimage_offline_pairs --limit 5

Main debug artifacts:

zimage_offline_pairs/metadata.json
zimage_offline_pairs/debug_trace.jsonl
zimage_offline_pairs/pairs/<sample_id>/eps.pt
zimage_offline_pairs/pairs/<sample_id>/z0.pt
zimage_offline_pairs/pairs/<sample_id>/x0.png

4. VAE Round-Trip Quality Check

Before training, verify that VAE encode/decode preserves image quality. If decoding z0.pt already looks soft compared to x0.png, that softness is your quality ceiling — no amount of LoRA training will produce crisper results.

# Check first 10 samples
uv run zimagesr-data decode-check \
  --pairs-dir ./zimage_offline_pairs/pairs \
  --limit 10

# Check specific samples
uv run zimagesr-data decode-check \
  --pairs-dir ./zimage_offline_pairs/pairs \
  --ids 000000,000010,000020

Each sample gets:

z0_decoded.png — VAE decode of the z0 latent
z0_roundtrip_grid.png — side-by-side comparison: x0.png (original) vs z0 decoded (VAE round-trip)

5. S3 Sync

Upload:

zimagesr-data s3-upload \
  --out-dir ./zimage_offline_pairs \
  --s3-uri s3://YOUR_BUCKET/zimagesr/run-001

Download:

zimagesr-data s3-download \
  --out-dir ./zimage_offline_pairs \
  --s3-uri s3://YOUR_BUCKET/zimagesr/run-001

6. FTD Training

FTD (Flow Trajectory Distillation) trains a LoRA adapter on the Z-Image transformer so it can predict clean latents from degraded ones in a single forward pass. Implements FluxSR paper Eq. 16/17 (FTD loss) and Eq. 18/21 (pixel reconstruction loss).

Prepare zL latents

If your pairs directory does not yet contain zL.pt files (VAE-encoded LR images), generate them first:

zimagesr-data generate-zl \
  --out-dir ./zimage_offline_pairs \
  --model-id Tongyi-MAI/Z-Image-Turbo

Run training

zimagesr-data train \
  --pairs-dir ./zimage_offline_pairs/pairs \
  --max-steps 750 \
  --batch-size 4 \
  --gradient-accumulation-steps 2 \
  --learning-rate 5e-5 \
  --tl 0.25 \
  --lora-rank 16 \
  --save-dir ./zimage_sr_lora_runs/ftd_run \
  --save-every 150

Key options:

Flag	Default	Description
`--tl`	0.25	Truncation level TL
`--rec-loss-every`	8	Pixel recon loss frequency (0 to disable)
`--lambda-tvlpips`	1.0	Weight for TV-LPIPS recon loss
`--lambda-z0`	0.0	Weight for latent endpoint loss `SmoothL1(z0_hat, z0)`
`--lambda-adl`	0.0	ADL regularization weight (set > 0 to enable)
`--detach-recon` / `--no-detach-recon`	on	Gradient-free recon (saves VRAM)
`--gradient-checkpointing` / `--no-gradient-checkpointing`	on	Reduce VRAM at cost of speed
`--mixed-precision`	no	`no`, `fp16`, or `bf16`
`--seed`	none	Reproducibility seed
`--save-dir`	`./zimage_sr_lora_runs/ftd_run_<timestamp>`	Checkpoint output directory
`--wandb` / `--no-wandb`	off	Enable native WandB logging for training
`--wandb-project`	zimagesr	WandB project name
`--wandb-mode`	online	`online` or `offline`
`--wandb-log-checkpoints`	on	Log saved LoRA checkpoints as model artifacts
`--save-full-state` / `--no-save-full-state`	off	Save optimizer/scheduler state for resume (large files)
`--checkpoint-infer-grid` / `--no-checkpoint-infer-grid`	off	Save checkpoint-time inference grids (one-step + optional multi-step sweeps)
`--wandb-log-checkpoint-grids` / `--no-wandb-log-checkpoint-grids`	on	Log checkpoint inference grids as WandB images
`--checkpoint-eval-ids`	empty	Fixed pair IDs used for checkpoint grids (instead of random batch sample)
`--checkpoint-eval-images-dir`	none	Folder of arbitrary images to include in checkpoint grids
`--checkpoint-eval-images-limit`	4	Max number of images loaded from `--checkpoint-eval-images-dir`
`--checkpoint-eval-input-upscale`	4.0	Bicubic upscale factor before VAE encode for eval images
`--checkpoint-eval-fit-multiple`	16	Resize eval images to a multiple before VAE encode
`--checkpoint-sr-scales`	1.3,1.6	Extra sr_scale values rendered in checkpoint grids (empty to disable)
`--checkpoint-refine-steps`	empty	Extra multi-step refinement counts rendered in checkpoint grids (e.g. `4,8,16`)
`--resume-from`	none	Resume from a checkpoint directory (auto-detects mode)

Enable WandB example:

zimagesr-data train \
  --pairs-dir ./zimage_offline_pairs/pairs \
  --wandb \
  --wandb-project zimagesr \
  --wandb-run-name ftd-run-001

Checkpoint inference-grid example:

zimagesr-data train \
  --pairs-dir ./zimage_offline_pairs/pairs \
  --save-every 500 \
  --checkpoint-infer-grid \
  --checkpoint-eval-ids 000000,000123,000777 \
  --checkpoint-eval-images-dir ./eval_images \
  --checkpoint-sr-scales 1.3,1.6 \
  --checkpoint-refine-steps 4,8 \
  --wandb \
  --wandb-log-checkpoint-grids

--checkpoint-infer-grid runs extra forward/decoder passes at each checkpoint step, so it adds runtime and some transient VRAM usage. Keep it off for max throughput.

Resume training

If a run is interrupted, resume with --resume-from. Resume mode is auto-detected from checkpoint contents:

Important: by default checkpoints are lightweight LoRA-only. Use --save-full-state during training if you need seamless full-state resume.

# Full resume — restores optimizer state, RNG, and step counter
zimagesr-data train \
  --pairs-dir ./zimage_offline_pairs/pairs \
  --resume-from ./zimage_sr_lora_runs/ftd_run/lora_step_300 \
  --save-dir ./zimage_sr_lora_runs/ftd_run \
  --max-steps 750

Two modes are auto-detected from the checkpoint directory contents:

Mode	Detected when	Behaviour
Full	`training_state.json` + `accelerator_state/` present	Seamless resume: optimizer momentum, RNG, and step counter restored
Weights-only	Only `adapter_config.json` present	Warm restart: LoRA weights loaded, fresh optimizer at step 0

For full resume, the trainer now validates key structural settings (model_id, LoRA rank/alpha/dropout, and selected training structure flags) against the checkpoint config before loading state, and raises a clear error on mismatch.

Weights-only mode is useful for resuming from older checkpoints (before this feature) or from checkpoints saved by other tools:

# Weights-only resume — loads LoRA weights, starts fresh optimizer at step 0
zimagesr-data train \
  --pairs-dir ./zimage_offline_pairs/pairs \
  --resume-from ./old_checkpoint_without_state \
  --max-steps 750

Checkpoint directory structure

lora_step_300/
    adapter_config.json          # LoRA config (PEFT)
    adapter_model.safetensors    # LoRA weights
    inference_grid_*.png         # (optional) checkpoint inference previews (one per eval sample)
    training_state.json          # (--save-full-state) step counter + serialized config
    accelerator_state/           # (--save-full-state) optimizer state, RNG, scheduler

By default only LoRA weights are saved (lightweight). Pass --save-full-state to also save optimizer/scheduler state for seamless resume. LoRA checkpoints are saved as PEFT adapters and remain directly usable for inference even without the training state files.

Dataset layout

Each sample directory under pairs/ should contain:

pairs/000000/
  eps.pt        # noise tensor (1, 16, 128, 128)
  z0.pt         # clean latent  (1, 16, 128, 128)
  zL.pt         # degraded latent (1, 16, 128, 128)
  x0.png        # (optional) HR ground truth for recon loss
  lr_up.png     # (optional) upscaled LR image for zL generation

7. SR Inference (One-step + Multi-step)

After training, run single-step super-resolution from Python:

import torch
from diffusers import ZImageImg2ImgPipeline, ZImageTransformer2DModel
from zimagesr.training.inference import one_step_sr
from zimagesr.training.lora import load_lora_for_inference
from zimagesr.training.transformer_utils import prepare_cap_feats

device, dtype = "cuda", torch.bfloat16

# Load pipeline (needed for VAE) and base transformer
pipe = ZImageImg2ImgPipeline.from_pretrained(
    "Tongyi-MAI/Z-Image-Turbo", torch_dtype=dtype
).to(device)

base_tr = pipe.transformer
base_tr.requires_grad_(False)

# Load LoRA adapter
lora_tr = load_lora_for_inference(base_tr, "path/to/lora_final", device, dtype)

# Prepare null caption features
cap_feats = prepare_cap_feats(pipe, device, dtype)  # (1, 2560)

# Load a degraded latent
zL = torch.load("path/to/pairs/000000/zL.pt", weights_only=True).to(device=device, dtype=dtype)

# Run one-step SR
pil_image = one_step_sr(
    transformer=lora_tr,
    vae=pipe.vae,
    lr_latent=zL,
    tl=0.25,
    sr_scale=1.0,
    refine_steps=1,
    vae_sf=0.3611,
    cap_feats_2d=cap_feats,
)
pil_image.save("sr_output.png")

CLI from existing pair directory (zL.pt):

zimagesr-data infer \
  --model-id Tongyi-MAI/Z-Image-Turbo \
  --lora-path ./zimage_sr_lora_runs/ftd_run/lora_final \
  --pair-dir ./zimage_offline_pairs/pairs/000000 \
  --sr-scale 1.0,1.2 \
  --refine-steps 1,8 \
  --compare-grid \
  --output ./sr_from_pair.png

CLI from arbitrary input image:

zimagesr-data infer \
  --model-id Tongyi-MAI/Z-Image-Turbo \
  --lora-path ./zimage_sr_lora_runs/ftd_run/lora_final \
  --input-image ./my_input.png \
  --input-upscale 4.0 \
  --fit-multiple 16 \
  --sr-scale 0.9 \
  --refine-steps 1,8 \
  --compare-grid \
  --output ./sr_from_image.png

Notes for infer:

In --pair-dir mode, the command loads zL.pt directly (paper notation).
In --input-image mode, the image is RGB-converted, optionally bicubic-upscaled (--input-upscale), then resized to dimensions divisible by --fit-multiple before VAE encoding.
Set --input-upscale 1.0 if your input is already in the intended pre-upscaled space.
--sr-scale controls correction strength at inference (z0_hat = zL - sr_scale * v(TL) * TL).
--refine-steps controls Euler refinement steps from t=TL to 0 (1 reproduces one-step inference).
Add --compare-grid to save <output>_grid.png with LR (decoded) | Base SR | LoRA SR ... and optional HR (ground truth) if x0.png exists in --pair-dir.

8. ZenML Minimal Pipelines

Run gather pipeline:

zimagesr-data zenml-run --mode gather --out-dir ./zimage_offline_pairs --n 1200

Run gather pipeline and upload to S3:

zimagesr-data zenml-run \
  --mode gather \
  --out-dir ./zimage_offline_pairs \
  --s3-uri s3://YOUR_BUCKET/zimagesr/run-001

Run download pipeline:

zimagesr-data zenml-run \
  --mode download \
  --s3-uri s3://YOUR_BUCKET/zimagesr/run-001 \
  --out-dir ./zimage_offline_pairs

9. ZenML Stack Bootstrap

The repository ships with zenml.yaml, used by the dedicated bootstrap helper.

Edit zenml.yaml:

set components.artifact_stores.s3.enabled: true for S3
set S3 path to your bucket
set components.experiment_tracker.wandb.enabled: true for WandB
optionally set entity for WandB

Export optional credentials:

export WANDB_API_KEY=...
export AWS_PROFILE=...

Bootstrap ZenML components/stacks:

zimagesr-zenml-bootstrap --config zenml.yaml

Dry-run preview:

zimagesr-zenml-bootstrap --config zenml.yaml --dry-run

Override activated stack:

zimagesr-zenml-bootstrap --config zenml.yaml --activate-stack zimagesr-s3-stack

Notes

The gather pipeline requires a Z-Image pipeline variant that accepts latents=.
FTD training requires a GPU with sufficient VRAM (tested on 40 GB A100). Reduce --batch-size and increase --gradient-accumulation-steps for smaller GPUs.
torch/CUDA installation is environment-specific; install the wheel that matches your CUDA runtime.
S3 sync uses boto3 default credential chain.
peft and lpips are only needed for training and are not required for data generation or S3 sync.

10. Citation

If you use this repository, cite both the implementation and the upstream papers.

@software{zimagesr_2026,
  title  = {ZImageSR: FluxSR-style Super-Resolution Pipeline for Z-Image Turbo},
  author = {Krzysztof Gonia},
  year   = {2026},
  note   = {Local project repository}
}

@article{li2025fluxsr,
  title   = {One Diffusion Step to Real-World Super-Resolution via Flow Trajectory Distillation},
  author  = {Li, Jianze and Cao, Jiezhang and Guo, Yong and Li, Wenbo and Zhang, Yulun},
  journal = {arXiv preprint arXiv:2502.01993},
  year    = {2025}
}

@article{cai2025zimage,
  title   = {Z-Image: An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer},
  author  = {Cai, Huanqia and Cao, Sihan and Du, Ruoyi and Gao, Peng and Hoi, Steven and others},
  journal = {arXiv preprint arXiv:2511.22699},
  year    = {2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
.idea		.idea
docs		docs
src/zimagesr		src/zimagesr
tests		tests
validation_input_images		validation_input_images
.gitattributes		.gitattributes
.gitignore		.gitignore
AGENT.md		AGENT.md
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock
zenml.yaml		zenml.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ZImageSR

Current Status (Important)

Known Limitations / Negative Results

1. Setup

2. Generate Data (Phase 1)

3. Debug and Inspect

4. VAE Round-Trip Quality Check

5. S3 Sync

6. FTD Training

Prepare zL latents

Run training

Resume training

Checkpoint directory structure

Dataset layout

7. SR Inference (One-step + Multi-step)

8. ZenML Minimal Pipelines

9. ZenML Stack Bootstrap

Notes

10. Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ZImageSR

Current Status (Important)

Known Limitations / Negative Results

1. Setup

2. Generate Data (Phase 1)

3. Debug and Inspect

4. VAE Round-Trip Quality Check

5. S3 Sync

6. FTD Training

Prepare zL latents

Run training

Resume training

Checkpoint directory structure

Dataset layout

7. SR Inference (One-step + Multi-step)

8. ZenML Minimal Pipelines

9. ZenML Stack Bootstrap

Notes

10. Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages