Skip to content

zihaowu25/InvarDiff

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

InvarDiff: Cross-Scale Invariance Caching for Accelerated Diffusion Models

Paper: arXiv:2512.05134  ·  PDF

(a) FLUX.1-dev

teaser1

(b) DiT-XL/2

teaser2

Introduction

InvarDiff is a training-free acceleration framework for diffusion models.

Built on feature invariance in deterministic sampling, InvarDiff generates a binary reuse plan across timesteps and layers and applies a step-first, then layer-wise caching policy at inference, reducing redundant compute while preserving fidelity. The method is validated on FLUX.1-dev and DiT-XL/2.

Core contributions

  • Cross-scale invariance identification

    Quantile-based change metrics measure stability at the timestep and layer/module levels, producing an interpretable binary cache matrix C[t, l, s] and a step gate c[t].

  • Two-phase calibration with resampling correction

    A few deterministic runs generate the initial thresholds; a second pass applies resampling correction to mitigate drift under consecutive reuse, yielding robust plans for deployment.

  • Deterministic execution: step-first, layer-wise next

    At runtime, InvarDiff first decides whether an entire step can be reused, otherwise it selectively reuses modules/layers. The schedule is fixed and predictable, requiring no model retraining.

  • Strong end-to-end speedups with minimal quality loss

    Under paper settings, InvarDiff reaches up to 3.31× e2e speedup on FLUX.1-dev (T=28) and up to 2.86× on DiT-XL/2 (T=50), with minimal impact on standard quality metrics and qualitatively near-identical results to full computation.

  • Practical and easy to reproduce

    A three-step workflow: calibrate → build plan → accelerated sampling, with small calibration sets, JSON plans that can be versioned, and scripts for benchmarking and visualization.

Speed–Quality tradeoff on FLUX.1-dev (A800, $T{=}28$)

speed_quality_curves

  • Each point shows latency and LPIPS for one operating point (35 total, calibration averages 5 prompts).

  • Each polyline fixes $\tau_{\mathrm{step}}\in{0.40,0.50,0.60,0.70,0.75}$ and sweeps seven preset threshold bundles. A bundle is $(\tau_{\text{warm-up}}, \tau_{\text{dual-attn}}, \tau_{\text{dual-ff}}, \tau_{\text{dual-context-ff}}, \tau_{\text{single-attn}}, \tau_{\text{single-ff}})$.

  • Bundle order is aligned across polylines, only $\tau_{\mathrm{step}}$ changes.

Overview

FLUX.1-dev

images

ours_images

ours_images1

ours_images2

DiT-XL/2

dit_images1

dit_images2

Citation

If you find InvarDiff useful or interesting for research or applications, please cite this work using the BibTeX below:

@misc{wu2025invardiffcrossscaleinvariancecaching,
      title={InvarDiff: Cross-Scale Invariance Caching for Accelerated Diffusion Models}, 
      author={Zihao Wu},
      year={2025},
      eprint={2512.05134},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2512.05134}, 
}

About

InvarDiff: Cross-Scale Invariance Caching for Accelerated Diffusion Models

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages