Accepted at ICLR 2026

Evolutionary Caching to Accelerate Your Off-the-Shelf Diffusion Model

Anirud Aggarwal, Abhinav Shrivastava, Matthew Gwilliam

University of Maryland, College Park

ECAD discovers efficient caching schedules for diffusion models through genetic algorithms

ECAD discovers efficient caching schedules for diffusion models through genetic algorithms, forming smooth Pareto frontiers that balance image quality and inference speed. Our method works with off-the-shelf models like PixArt-α, PixArt-Σ, and FLUX-1.dev without any parameter tuning.

Abstract

Diffusion-based image generation models excel at producing high-quality synthetic content, but suffer from slow and computationally expensive inference. Prior work has attempted to mitigate this by caching and reusing features within diffusion transformers across inference steps. These methods, however, often rely on rigid heuristics that result in limited acceleration or poor generalization across architectures.

We propose Evolutionary Caching to Accelerate Diffusion models (ECAD), a genetic algorithm that learns efficient, per-model, caching schedules forming a Pareto frontier, using only a small set of calibration prompts. ECAD requires no modifications to network parameters or reference images. It offers significant inference speedups, enables fine-grained control over the quality-latency trade-off, and adapts seamlessly to different diffusion models.

Notably, ECAD's learned schedules can generalize effectively to resolutions and model variants not seen during calibration. We evaluate ECAD on PixArt-α, PixArt-Σ, and FLUX-1.dev using multiple metrics (FID, CLIP, Image Reward) across diverse benchmarks (COCO, MJHQ-30k, PartiPrompts), demonstrating consistent improvements over previous approaches. On PixArt-α, ECAD identifies a schedule that outperforms the previous state-of-the-art method by 4.47 COCO FID while increasing inference speedup from 2.35x to 2.58x.

Interactive Pareto Frontier

Explore the discovered Pareto frontiers interactively. The visualization shows PixArt-α at 256×256 resolution on PartiPrompts (unseen during optimization). These images give some visual insight into the quality of the images generated by each schedule, while the quality metrics in the tables below provide a more quantitative comparison.

Best viewed on desktop for full interactivity.

Method Overview

ECAD formulates diffusion caching as a multi-objective optimization problem, discovering Pareto-optimal trade-offs between computational efficiency and generation quality. Our approach uses genetic algorithms to evolve caching schedules represented as binary tensors $S \in \{0,1\}^{N \times B \times C}$, where $N$ is the number of diffusion steps, $B$ is the number of transformer blocks, and $C$ is the number of cacheable components per block.

Key Components

  • Component-level Caching: We cache individual transformer components (self-attention, cross-attention, feedforward) rather than entire blocks.
  • Genetic Algorithm: NSGA-II evolves a population of caching schedules using selection, crossover, and mutation operations.
  • Multi-objective Optimization: Simultaneously optimize for low computational cost (TMACs) and high generation quality (Image Reward).
  • Calibration-based: Uses only 100 text prompts for optimization, no image data required.
ECAD Algorithm (click to expand)

Input: Diffusion model $\mathcal{M}$, calibration prompts $\mathcal{P} = \{p_1, ..., p_m\}$, population size $n$, generations $G$, crossover probability $p_c$, mutation probability $p_m$

Output: Pareto frontier $\mathcal{F}$ of caching schedules

  1. $\mathcal{P}_0 \leftarrow \text{InitializePopulation}(n)$ // Random and heuristic schedules
  2. for $g = 1$ to $G$ do
  3. for each schedule $S \in \mathcal{P}_{g-1}$ do
  4. $\mathcal{I} \leftarrow \mathcal{M}_S(\mathcal{P})$ // Generate images with schedule $S$
  5. $S_{q} \leftarrow \mathcal{Q}(\mathcal{P}, \mathcal{I})$ // Compute Image Reward score
  6. $S_{c} \leftarrow \mathcal{C}(S)$ // Compute TMACs
  7. end for
  8. $\mathcal{P}_g \leftarrow \text{NSGA-II-Selection}(\mathcal{P}_{g-1})$ // Tournament selection
  9. $\mathcal{P}_g \leftarrow \text{Crossover}(\mathcal{P}_g, p_c)$ // 4-point crossover
  10. $\mathcal{P}_g \leftarrow \text{Mutation}(\mathcal{P}_g, p_m)$ // Bit-flip mutation
  11. end for
  12. $\mathcal{F} \leftarrow \text{ComputeParetoFrontier}(\bigcup_{i=1}^{G} \mathcal{P}_i)$
  13. return $\mathcal{F}$

Note: Each schedule $S \in \{0,1\}^{N \times B \times C}$ is a binary tensor where $N$ = diffusion steps, $B$ = transformer blocks, $C$ = cacheable components per block.

Input: Diffusion model $\mathcal{M}$, calibration prompts $\mathcal{P} = \{p_1, ..., p_m\}$, population size $n$, generations $G$...

Output: Pareto frontier $\mathcal{F}$ of caching schedules

  1. $\mathcal{P}_0 \leftarrow \text{InitializePopulation}(n)$
  2. for $g = 1$ to $G$ do
  3. for each schedule $S \in \mathcal{P}_{g-1}$ do
ECAD optimization process
ECAD optimization loop: Text prompts drive the diffusion transformer, evaluated on Image Reward and MACs. A genetic algorithm evolves caching schedules over G generations.
Component-level caching in diffusion transformers
Component-level caching: Instead of caching entire transformer blocks, ECAD selectively caches individual components (self-attention, cross-attention, feedforward) across diffusion timesteps, enabling finer-grained control over the quality-speed trade-off.

Results

PixArt-α Pareto frontier
PixArt-α: Pareto frontier at 256×256 on PartiPrompts (unseen during calibration), demonstrating superior trade-offs compared to FORA, ToCa, and TGATE.
FLUX-1.dev Pareto frontier
FLUX-1.dev: Pareto frontiers at 256×256 on PartiPrompts (unseen during calibration). ECAD discovers efficient schedules across a wide range of speedups.
ECAD evolution across generations
ECAD Evolution: Progressive improvement of Pareto frontiers across 550 generations on PixArt-α. Pareto frontiers are shown as a function of TMACs and Image Reward score on the calibration set. Lighter shades indicate earlier generations; darker shades indicate later ones. The final frontier is denoted in black.

Quantitative Results at 256×256 Resolution

We evaluated ECAD on three popular diffusion models with 20-step generation. Our method consistently outperforms prior approaches across multiple metrics while providing flexible speed-quality trade-offs. Despite being optimized only on Image Reward using 100 calibration prompts, ECAD achieves superior results on unseen benchmarks.

PixArt-α Results (click to expand)
Method Setting TMACs↓ Speedup↑ Image Reward↑ MJHQ FID↓ MJHQ CLIP↑
None - 5.71 1.00× 0.97 9.75 32.77
TGATE m=15, k=1 4.86 1.14× 0.87 10.38 32.33
FORA N=2 2.87 1.65× 0.91 10.33 32.74
ToCa N=3, R=90% 2.13 2.35× 0.68 11.80 32.35
DuCa N=3, R=60% 3.20 2.29× 0.79 11.69 32.48
DuCa N=3, R=90% 2.30 2.59× 0.74 12.53 32.39
ECAD fast 2.13 1.97× 0.99 8.02 32.78
ECAD faster 1.46 2.40× 0.88 9.92 32.34
ECAD fastest 1.18 2.58× 0.77 8.67 32.24
Method Setting TMACs↓ Speedup↑ Image Reward↑ MJHQ FID↓ MJHQ CLIP↑
None - 5.71 1.00× 0.97 9.75 32.77
TGATE m=15, k=1 4.86 1.14× 0.87 10.38 32.33
FORA N=2 2.87 1.65× 0.91 10.33 32.74
FLUX-1.dev Results (click to expand)
Method Setting TMACs↓ Speedup↑ Image Reward↑ MJHQ FID↓ MJHQ CLIP↑
None - 198.69 1.00× 1.04 17.77 31.06
FORA N=3 69.80 2.44× 0.93 19.38 31.10
ToCa N=4, R=90% 42.96* 1.66×* 0.93 21.59 30.88
DiCache - 62.23 2.26× 0.97 20.70 31.18
TaylorSeer N=5, O=2 59.88* 2.55×* 0.54 24.36 30.64
TaylorSeer N=6, O=1 49.97* 3.03×* 0.02 37.98 29.38
ECAD fast 63.02 2.58× 1.04 16.14 31.69
ECAD fastest 43.60 3.37× 0.89 21.43 31.67
Method Setting TMACs↓ Speedup↑ Image Reward↑ MJHQ FID↓ MJHQ CLIP↑
None - 198.69 1.00× 1.04 17.77 31.06
FORA N=3 69.80 2.44× 0.93 19.38 31.10
ToCa N=4, R=90% 42.96* 1.66×* 0.93 21.59 30.88
Resolution Transfer Results (FLUX-1.dev 1024×1024) (click to expand)

One of ECAD's key strengths is its ability to generalize across resolutions. We demonstrate this by applying schedules optimized at 256×256 resolution directly to 1024×1024 image generation without any additional optimization, for FLUX-1.dev. Despite the 16× increase in pixel count, our schedules maintain competitive performance compared to methods specifically optimized for high resolution.

Method Setting TMACs↓ Speedup↑ Image Reward↑ COCO FID↓ COCO CLIP↑
None - 1190.25 1.00× 1.14 25.45 31.08
None 40% steps 476.10 2.41× 0.83 25.20 30.73
FORA N=3 416.88 2.40× 0.69 29.45 30.52
ToCa N=4, R=90% 300.41* 2.47×* 1.09 26.88 31.32
TaylorSeer N=5, O=2 357.39* 2.54×* 0.94 42.81 31.74
ECAD slow256→1024 644.05 1.73× 1.05 22.15 31.00
ECAD fast256→1024 376.62 2.63× 1.05 26.69 30.91

This demonstrates that ECAD's discovered caching patterns capture fundamental properties of the diffusion process that remain effective across different resolutions, making it practical for deployment in varied settings without requiring resolution-specific optimization.

Method Setting TMACs↓ Speedup↑ Image Reward↑ COCO FID↓ COCO CLIP↑
None - 1190.25 1.00× 1.14 25.45 31.08
None 40% steps 476.10 2.41× 0.83 25.20 30.73
FORA N=3 416.88 2.40× 0.69 29.45 30.52

Qualitative Results

Qualitative comparison
PixArt-α 256×256: Comparisons between baseline, ToCa, and ECAD on unseen prompts. ECAD maintains high visual quality while achieving significant speedups.
FLUX qualitative comparison
FLUX-1.dev 256×256: Left-to-right: uncached baseline, ToCa (1.75× speedup), ECAD "fast" (1.97× speedup). ECAD yields sharper images with improved prompt adherence.

Learned Caching Schedules

ECAD discovers diverse caching patterns that vary across timesteps, blocks, and components. Red indicates cached components, gray indicates recomputed.

PixArt-α fast schedule
PixArt-α "fast": Left-to-right: self-attention, cross-attention, feedforward.
FLUX-1.dev fast schedule
FLUX-1.dev "fast": Left-to-right components for full blocks (numbered 0-18): multi-stream joint-attention, feedforward, and feedforward context. Single blocks (numbered 19-56): single-stream joint-attention, linear MLP input projection, and linear MLP output projection.

Citation

If you find our work useful, please consider citing:

@misc{aggarwal2025evolutionarycachingaccelerateofftheshelf,
      title={Evolutionary Caching to Accelerate Your Off-the-Shelf Diffusion Model},
      author={Anirud Aggarwal and Abhinav Shrivastava and Matthew Gwilliam},
      year={2025},
      eprint={2506.15682},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2506.15682},
}