Evolutionary Caching to Accelerate Your Off-the-Shelf Diffusion Model

University of Maryland, College Park
ECAD discovers efficient caching schedules for diffusion models through genetic algorithms

ECAD discovers efficient caching schedules for diffusion models through genetic algorithms, forming smooth Pareto frontiers that balance image quality and inference speed. Our method works with off-the-shelf models like PixArt-α, PixArt-Σ, and FLUX-1.dev without any parameter tuning.

Abstract

Diffusion-based image generation models excel at producing high-quality synthetic content, but suffer from slow and computationally expensive inference. Prior work has attempted to mitigate this by caching and reusing features within diffusion transformers across inference steps. These methods, however, often rely on rigid heuristics that result in limited acceleration or poor generalization across architectures.

We propose Evolutionary Caching to Accelerate Diffusion models (ECAD), a genetic algorithm that learns efficient, per-model, caching schedules forming a Pareto frontier, using only a small set of calibration prompts. ECAD requires no modifications to network parameters or reference images. It offers significant inference speedups, enables fine-grained control over the quality-latency trade-off, and adapts seamlessly to different diffusion models.

Notably, ECAD's learned schedules can generalize effectively to resolutions and model variants not seen during calibration. We evaluate ECAD on PixArt-α, PixArt-Σ, and FLUX-1.dev using multiple metrics (FID, CLIP, Image Reward) across diverse benchmarks (COCO, MJHQ-30k, PartiPrompts), demonstrating consistent improvements over previous approaches. On PixArt-α, ECAD identifies a schedule that outperforms the previous state-of-the-art method by 4.47 COCO FID while increasing inference speedup from 2.35x to 2.58x.

Interactive Pareto Frontier Explorer

Explore the discovered Pareto frontiers interactively. The visualization shows the Pareto frontiers on PixArt-α at 256×256 resolution on the PartiPrompts set, which is unseen by ECAD during optimization. These images give some visual insight into the quality of the images generated by each schedule, while the quality metrics in the tables below provide a more quantitative comparison.

Interactive visualization of ECAD's Pareto-optimal schedules. Best viewed on desktop for full interactivity.

Method Overview

ECAD formulates diffusion caching as a multi-objective optimization problem, discovering Pareto-optimal trade-offs between computational efficiency and generation quality. Our approach uses genetic algorithms to evolve caching schedules represented as binary tensors $S \in \{0,1\}^{N \times B \times C}$, where $N$ is the number of diffusion steps, $B$ is the number of transformer blocks, and $C$ is the number of cacheable components per block.

ECAD method overview showing transformer architecture and genetic evolution

Key Components

  • Component-level Caching: We cache individual transformer components (self-attention, cross-attention, feedforward) rather than entire blocks.
  • Genetic Algorithm: NSGA-II evolves a population of caching schedules using selection, crossover, and mutation operations.
  • Multi-objective Optimization: Simultaneously optimize for low computational cost (TMACs) and high generation quality (Image Reward).
  • Calibration-based: Uses only 100 text prompts for optimization, no image data required.

ECAD Algorithm

Input: Diffusion model $\mathcal{M}$, calibration prompts $\mathcal{P} = \{p_1, ..., p_m\}$, population size $n$, generations $G$, crossover probability $p_c$, mutation probability $p_m$

Output: Pareto frontier $\mathcal{F}$ of caching schedules

1: $\mathcal{P}_0 \leftarrow \text{InitializePopulation}(n)$     // Random and heuristic schedules
2: for $g = 1$ to $G$ do
3:     for each schedule $S \in \mathcal{P}_{g-1}$ do
4:         $\mathcal{I} \leftarrow \mathcal{M}_S(\mathcal{P})$     // Generate images with schedule $S$
5:         $S_{q} \leftarrow \mathcal{Q}(\mathcal{P}, \mathcal{I})$     // Compute Image Reward score
6:         $S_{c} \leftarrow \mathcal{C}(S)$     // Compute TMACs
7:     end for
8:     $\mathcal{P}_g \leftarrow \text{NSGA-II-Selection}(\mathcal{P}_{g-1})$     // Tournament selection
9:     $\mathcal{P}_g \leftarrow \text{Crossover}(\mathcal{P}_g, p_c)$     // 4-point crossover
10:     $\mathcal{P}_g \leftarrow \text{Mutation}(\mathcal{P}_g, p_m)$     // Bit-flip mutation
11: end for
12: $\mathcal{F} \leftarrow \text{ComputeParetoFrontier}(\bigcup_{i=1}^{G} \mathcal{P}_i)$
13: return $\mathcal{F}$

Note: Each schedule $S \in \{0,1\}^{N \times B \times C}$ is a binary tensor where $N$ = diffusion steps, $B$ = transformer blocks, $C$ = cacheable components per block.

Results

PixArt-α Pareto frontier showing trade-offs between image quality and latency
PixArt-α: Pareto frontier at 256x256 resolution on PartiPrompts (unseen during calibration), demonstrating superior trade-offs compared to FORA, ToCa, and TGATE.
FLUX-1.dev Pareto frontier across different speedup configurations
FLUX-1.dev: Pareto frontiers at 256x256 resolution on PartiPrompts (unseen during calibration). ECAD discovers efficient schedules across a wide range of speedups.
ECAD evolution across 550 generations showing progressive improvement
ECAD Evolution: Progressive improvement of Pareto frontiers across 550 generations on PixArt-α. Pareto frontiers are shown as a function of TMACs and Image Reward score on the Image Reward calibration set. Lighter shades indicate earlier generations and darker indicate later ones. The 'final' frontier across all 550 generations is denoted in black.

Quantitative Results at 256×256 Resolution

We evaluated ECAD on three popular diffusion models with 20-step generation. Our method consistently outperforms prior approaches across multiple metrics while providing flexible speed-quality trade-offs. Despite being optimized only on Image Reward using 100 calibration prompts, ECAD achieves superior results on unseen benchmarks.

Model Caching Method Setting Latency Quality Metrics
TMACs↓ Speedup↑ PartiPrompts
Image Reward↑
MJHQ FID↓ MJHQ CLIP↑
PixArt-α None - 5.71 1.00× 0.97 9.75 32.77
TGATE m=15, k=1 4.86 1.14× 0.87 10.38 32.33
FORA N=2 2.87 1.65× 0.91 10.33 32.74
ToCa N=3, R=90% 2.13 2.35× 0.68 11.80 32.35
DuCa N=3, R=60% 3.20 2.29× 0.79 11.69 32.48
DuCa N=3, R=90% 2.30 2.59× 0.74 12.53 32.39
ECAD (Ours) fast 2.13 1.97× 0.99 8.02 32.78
ECAD (Ours) faster 1.46 2.40× 0.88 9.92 32.34
ECAD (Ours) fastest 1.18 2.58× 0.77 8.67 32.24
FLUX-1.dev None - 198.69 1.00× 1.04 17.77 31.06
FORA N=3 69.80 2.44× 0.93 19.38 31.10
ToCa N=4, R=90% 42.96 1.66× 0.93 21.59 30.88
ECAD (Ours) fast 63.02 2.58× 1.04 16.14 31.69
ECAD (Ours) fastest 43.60 3.37× 0.89 21.43 31.67

Resolution Transfer Results

One of ECAD's key strengths is its ability to generalize across resolutions. We demonstrate this by applying schedules optimized at 256×256 resolution directly to 1024×1024 image generation without any additional optimization, for FLUX-1.dev. Despite the 16× increase in pixel count, our schedules maintain competitive performance compared to methods specifically optimized for high resolution.

Model Caching Method Setting Latency Quality Metrics
TMACs↓ Speedup↑ PartiPrompts
Image Reward↑
COCO FID↓ COCO CLIP↑
FLUX-1.dev None - 1190.25 1.00× 1.14 25.45 31.08
None 40 % steps 476.10 2.41× 0.83 25.20 30.73
FORA N = 3 416.88 2.40× 0.69 29.45 30.52
ToCa N = 4, R = 90 % 300.41 2.47× 1.09 26.88 31.32
ECAD (Ours) slow256→1024 644.05 1.73× 1.05 22.15 31.00
ECAD (Ours) fast256→1024 376.62 2.63× 1.05 26.69 30.91

This demonstrates that ECAD's discovered caching patterns capture fundamental properties of the diffusion process that remain effective across different resolutions, making it practical for deployment in varied settings without requiring resolution-specific optimization.

Key Findings

  • Superior Quality-Speed Trade-offs: ECAD consistently discovers schedules that outperform prior methods in both quality metrics and speedup factors.
  • Generalization: Schedules optimized at 256×256 resolution maintain competitive performance when applied to 1024×1024 images.
  • Model Transfer: ECAD schedules can be transferred between PixArt variants with minimal performance degradation.
  • Fast Convergence: Competitive schedules emerge within 50 generations, with continued improvements over longer optimization.

Qualitative Results

Qualitative comparison showing generated images from baseline, ToCa, and ECAD

PixArt-α 256×256 qualitative comparisons between baseline, ToCa, and our ECAD schedules on unseen prompts. ECAD maintains high visual quality while achieving significant speedups.

FLUX-1.dev qualitative comparison at 256×256 resolution

FLUX-1.dev 256×256 qualitative comparisons, unseen prompts. Left-to-right: uncached baseline, ToCa (N=5, R=90%; 1.75x speedup), and our "fast" ECAD schedule (1.97x speedup). ECAD consistently yields sharper images with improved prompt adherence.

Learned Caching Schedules

ECAD discovers diverse caching patterns that vary across timesteps, blocks, and components. The visualizations below show which components are cached (red) vs. recomputed (gray) for our optimized schedules.

PixArt-α fast schedule visualization
PixArt-α "fast" schedule: Components from left to right are self-attention, cross-attention, and feedforward. Red indicates cached components.
FLUX-1.dev fast schedule visualization
FLUX-1.dev "fast" schedule: caching visualization. Left-to-right components for full blocks (numbered 0-18): multi-stream joint-attention, feedforward, and feedforward context. Single blocks (numbered 19-56): single-stream joint-attention, linear MLP input projection, and linear MLP output projection.

Citation

If you find our work useful in your research, please consider citing:

@misc{aggarwal2025evolutionarycachingaccelerateofftheshelf,
      title={Evolutionary Caching to Accelerate Your Off-the-Shelf Diffusion Model}, 
      author={Anirud Aggarwal and Abhinav Shrivastava and Matthew Gwilliam},
      year={2025},
      eprint={2506.15682},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2506.15682}, 
}