Diffusion-based image generation models excel at producing
high-quality synthetic content, but suffer from slow and
computationally expensive inference. Prior work has attempted to
mitigate this by caching and reusing features within diffusion
transformers across inference steps. These methods, however,
often rely on rigid heuristics that result in limited
acceleration or poor generalization across architectures.
We propose Evolutionary
Caching to Accelerate
Diffusion models (ECAD), a genetic algorithm
that learns efficient, per-model, caching schedules forming a
Pareto frontier, using only a small set of calibration prompts.
ECAD requires no modifications to network parameters or
reference images. It offers significant inference speedups,
enables fine-grained control over the quality-latency trade-off,
and adapts seamlessly to different diffusion models.
Notably, ECAD's learned schedules can generalize effectively to
resolutions and model variants not seen during calibration. We
evaluate ECAD on PixArt-α, PixArt-Σ, and FLUX-1.dev using
multiple metrics (FID, CLIP, Image Reward) across diverse
benchmarks (COCO, MJHQ-30k, PartiPrompts), demonstrating
consistent improvements over previous approaches. On PixArt-α,
ECAD identifies a schedule that outperforms the previous
state-of-the-art method by 4.47 COCO FID while increasing
inference speedup from 2.35x to 2.58x.