arXiv
Survey Paper + Interactive Atlas

Efficient Video Diffusion Models: Advancements and Challenges

Shitong Shao, Lichen Bai, Pengfei Wan, James Kwok, Zeke Xie

This survey provides a systematic, deployment-oriented review of efficient video diffusion models. It organizes the field into four main paradigms: step distillation, efficient attention, model compression, and cache / trajectory optimization, while emphasizing the trade-off between reducing function evaluations and reducing per-step overhead.

Taxonomy Overview

Step Distillation

Distribution Distillation

Streaming Distillation

Causal Forcing: Autoregressive Diffusion Distillation Done Right for High-Quality Real-Time Interactive Video Generation

Hongzhou Zhu, Min Zhao, Guande He, Hang Su, Chongxuan Li, Jun Zhu

2026 · arxiv.org
Open Paper

AR-Diffusion: Asynchronous Video Generation with Auto-Regressive Diffusion

Mingzhen Sun, Weining Wang, Gen Li, Jiawei Liu, Jiahui Sun, Wanquan Feng, Shanshan Lao, Siyu Zhou, Qian He, Jing Liu

2025 · Computer Vision and Pattern Recognition
Open Paper

AutoRefiner: Improving Autoregressive Video Diffusion Models via Reflective Refinement Over the Stochastic Sampling Path

Z Yu, A Hayakawa, M Ishii, Q Yu, T Shibuya

2025 · arxiv.org
Open Paper

Autoregressive Adversarial Post-Training for Real-Time Interactive Video Generation

Shanchuan Lin, Ceyuan Yang, Hao He, Jianwen Jiang, Yuxi Ren, Xin Xia, Yang Zhao, Xuefeng Xiao, Lu Jiang

2025 · arxiv.org
Open Paper

BlockVid: Block Diffusion for High-Quality and Consistent Minute-Long Video Generation

Z Zhang, S Chang, Y He, Y Han, J Tang

2025 · arxiv.org
Open Paper

Deep Forcing: Training-Free Long Video Generation with Deep Sink and Participative Compression

Jung Yi, Wooseok Jang, Paul Hyunbin Cho, Jisu Nam, Heeji Yoon, Seungryong Kim

2025 · arxiv.org
Open Paper

End-to-End Training for Autoregressive Video Diffusion via Self-Resampling

Y Guo, C Yang, H He, Y Zhao, M Wei, Z Yang

2025 · arxiv.org
Open Paper

Generative pre-trained autoregressive diffusion transformer

Y Zhang, J Jiang, G Ma, Z Lu, H Huang, J Yuan

2025 · arxiv.org
Open Paper

InfVSR: Breaking Length Limits of Generic Video Super-Resolution

Z Zhang, K Liu, Z Chen, X Li, Y Chen, B Duan

2025 · arxiv.org
Open Paper

Knot Forcing: Taming Autoregressive Video Diffusion Models for Real-time Infinite Interactive Portrait Animation

S Xiao, XI Zhang, D Meng, Q Wang, P Zhang

2025 · arxiv.org
Open Paper

Live avatar: Streaming real-time audio-driven avatar generation with infinite length

Yubo Huang, Hailong Guo, Fangtai Wu, Shifeng Zhang, Shijie Huang, Qijun Gan, Lin Liu, Sirui Zhao, Enhong Chen, Jiaming Liu, others

2025 · arxiv.org
Open Paper

LongDWM: Cross-Granularity Distillation for Building a Long-Term Driving World Model

X Wang, Z Wu, P Peng

2025 · arxiv.org
Open Paper

LongLive: Real-time Interactive Long Video Generation

Shuai Yang, Wei Huang, Ruihang Chu, Yicheng Xiao, Yuyang Zhao, Xianbang Wang, Muyang Li, Enze Xie, Yingcong Chen, Yao Lu, Song Han, Yukang Chen

2025 · arxiv.org
Open Paper

Lumos-1: On autoregressive video generation from a unified model perspective

H Yuan, W Chen, J Cen, H Yu, J Liang

2025 · arxiv.org
Open Paper

Magicinfinite: Generating infinite talking videos with your words and voice

H Yi, T Ye, S Shao, X Yang, J Zhao, H Guo

2025 · arxiv.org
Open Paper

Matrix-game 2.0: An open-source real-time and streaming interactive world model

Xianglong He, Chunli Peng, Zexiang Liu, Boyang Wang, Yifan Zhang, Qi Cui, Fei Kang, Biao Jiang, Mengyin An, Yangyang Ren, Baixin Xu, Hao-Xiang Guo, Kaixiong Gong, Size Wu, Wei Li, Xuchen Song, Yang Liu, Yangguang Li, Yahui Zhou

2025 · arxiv.org
Open Paper

Memorize-and-Generate: Towards Long-Term Consistency in Real-Time Video Generation

T Zhu, S Zhang, Z Sun, J Tian, Y Tang

2025 · arxiv.org
Open Paper

Memory Forcing: Spatio-Temporal Memory for Consistent Scene Generation on Minecraft

Junchao Huang, Xinting Hu, Boyao Han, Shaoshuai Shi, Zhuotao Tian, Tianyu He, Li Jiang

2025 · arxiv.org
Open Paper

MotionStream: Real-Time Video Generation with Interactive Motion Controls

Joonghyuk Shin, Zhengqi Li, Richard Zhang, Jun-Yan Zhu, Jaesik Park, Eli Shechtman, Xun Huang

2025 · arxiv.org
Open Paper

Playing with Transformer at 30+ FPS via Next-Frame Diffusion

Xinle Cheng, Tianyu He, Jiayi Xu, Junliang Guo, Di He, Jiang Bian

2025 · arxiv.org
Open Paper

RAP: Real-time Audio-driven Portrait Animation with Video Diffusion Transformer

F Du, T Li, Z Zhang, Q Qiao, T Yu, D Zhen, X Jia

2025 · arxiv.org
Open Paper

Real-Time Motion-Controllable Autoregressive Video Diffusion

Kesen Zhao, Jiaxin Shi, Beier Zhu, Junbao Zhou, Xiaolong Shen, Yuan Zhou, Qianru Sun, Hanwang Zhang

2025 · arxiv.org
Open Paper

REST: Diffusion-based Real-time End-to-end Streaming Talking Head Generation via ID-Context Caching and Asynchronous Streaming Distillation

H Wang, Y Weng, X Yu, J Du, H Xu, X Wu, S He

2025 · arxiv.org
Open Paper

Reward Forcing: Efficient Streaming Video Generation with Rewarded Distribution Matching Distillation

Yunhong Lu, Yanhong Zeng, Haobo Li, Hao Ouyang, Qiuyu Wang, Ka Leong Cheng, Jiapeng Zhu, Hengyuan Cao, Zhipeng Zhang, Xing Zhu, others

2025 · arxiv.org
Open Paper

Rolling Forcing: Autoregressive Long Video Diffusion in Real Time

Kunhao Liu, Wenbo Hu, Jiale Xu, Ying Shan, Shijian Lu

2025 · arxiv.org
Open Paper

Self Forcing: Bridging the Train-Test Gap in Autoregressive Video Diffusion

Xun Huang, Zhengqi Li, Guande He, Mingyuan Zhou, Eli Shechtman

2025 · arxiv.org
Open Paper

Self-Forcing++: Towards Minute-Scale High-Quality Video Generation

Justin Cui, Jie Wu, Ming Li, Tao Yang, Xiaojie Li, Rui Wang, Andrew Bai, Yuanhao Ban, Cho-Jui Hsieh

2025 · arxiv.org
Open Paper

SkyReels-V2: Infinite-length Film Generative Model

Guibin Chen, Dixuan Lin, Jiangping Yang, Chunze Lin, Junchen Zhu, Mingyuan Fan, Hao Zhang, Sheng Chen, Zheng Chen, Chengcheng Ma, Weiming Xiong, Wei Wang, Nuo Pang, Kang Kang, Zhiheng Xu, Yuzhe Jin, Yupeng Liang, Yubing Song, Peng Zhao, Boyuan Xu, Di Qiu, Debang Li, Zhengcong Fei, Yang Li, Yahui Zhou

2025 · arxiv.org
Open Paper

StreamAvatar: Streaming Diffusion Models for Real-Time Interactive Human Avatars

Z Sun, Z Peng, Y Ma, Y Chen, Z Zhou, Z Zhou

2025 · arxiv.org
Open Paper

Streamdit: Real-time streaming text-to-video generation

A Kodaira, T Hou, J Hou, M Georgopoulos

2025 · arxiv.org
Open Paper

TalkingMachines: Real-Time Audio-Driven FaceTime-Style Video via Autoregressive Diffusion Models

Chetwin Low, Weimin Wang

2025 · arxiv.org
Open Paper

Taming Teacher Forcing for Masked Autoregressive Video Generation

Deyu Zhou, Quan Sun, Yuang Peng, Kun Yan, Runpei Dong, Duomin Wang, Zheng Ge, Nan Duan, Xiangyu Zhang

2025 · Computer Vision and Pattern Recognition
Open Paper

UniCP: A Unified Caching and Pruning Framework for Efficient Video Generation

Wenzhang Sun, Qirui Hou, Donglin Di, Jiahui Yang, Yongjia Ma, Jianxun Cui

2025 · Proceedings of the 7th ACM International Conference on Multimedia in Asia
Open Paper

VideoSSM: Autoregressive Long Video Generation with Hybrid State-Space Memory

Y Yu, X Wu, X Hu, T Hu, Y Sun, X Lyu, B Wang

2025 · arxiv.org
Open Paper

ViSA: 3D-Aware Video Shading for Real-Time Upper-Body Avatar Creation

F Yang, H Li, P Li, W Yuan, L Qiu, C Song

2025 · arxiv.org
Open Paper

Autoregressive Video Generation without Vector Quantization

Haoge Deng, Ting Pan, Haiwen Diao, Zhengxiong Luo, Yufeng Cui, Huchuan Lu, Shiguang Shan, Yonggang Qi, Xinlong Wang

2024 · arxiv.org
Open Paper

Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion

Boyuan Chen, Diego Marti Monso, Yilun Du, Max Simchowitz, Russ Tedrake, Vincent Sitzmann

2024 · Neural Information Processing Systems
Open Paper

Diffusion Models Are Real-Time Game Engines

Dani Valevski, Yaniv Leviathan, Moab Arar, Shlomi Fruchter

2024 · International Conference on Learning Representations
Open Paper

FIFO-Diffusion: Generating Infinite Videos from Text without Training

Jihwan Kim, Junoh Kang, Jinyoung Choi, Bohyung Han

2024 · Neural Information Processing Systems
Open Paper

From Slow Bidirectional to Fast Autoregressive Video Diffusion Models

Tianwei Yin, Qiang Zhang, Richard Zhang, William T. Freeman, Frédo Durand, Eli Shechtman, Xun Huang

2024 · Computer Vision and Pattern Recognition
Open Paper

From Slow Bidirectional to Fast Causal Video Generators

Tianwei Yin, Qiang Zhang, Richard Zhang, William T Freeman, Fredo Durand, Eli Shechtman, Xun Huang

2024 · arxiv.org
Open Paper

Looking backward: Streaming video-to-video translation with feature banks

F Liang, A Kodaira, C Xu, M Tomizuka

2024 · arxiv.org
Open Paper

Streaming video diffusion: Online video editing with diffusion models

F Chen, Z Yang, B Zhuang, Q Wu

2024 · arxiv.org
Open Paper

StreamingT2V: Consistent, Dynamic, and Extendable Long Video Generation from Text

Roberto Henschel, Levon Khachatryan, Hayk Poghosyan, Daniil Hayrapetyan, Vahram Tadevosyan, Zhangyang Wang, Shant Navasardyan, Humphrey Shi

2024 · Computer Vision and Pattern Recognition
Open Paper

Non-Streaming Distillation

Transition Matching Distillation for Fast Video Generation

W Nie, J Berner, N Ma, C Liu, S Xie

2026 · arxiv.org
Open Paper

BLADE: Block-Sparse Attention Meets Step Distillation for Efficient Video Generation

Youping Gu, Xiaolong Li, Yuhao Hu, Minqi Chen, Bohan Zhuang

2025 · arxiv.org
Open Paper

EquiVDM: Equivariant Video Diffusion Models with Temporally Consistent Noise

C Liu, A Vahdat

2025 · arxiv.org
Open Paper

FlashVSR: Towards Real-Time Diffusion-Based Streaming Video Super-Resolution

Junhao Zhuang, Shi Guo, Xin Cai, Xiaohui Li, Yihao Liu, Chun Yuan, Tianfan Xue

2025 · arxiv.org
Open Paper

Learning Few-Step Diffusion Models by Trajectory Distribution Matching

Yihong Luo, Tianyang Hu, Jiacheng Sun, Yujun Cai, Jing Tang

2025 · arxiv.org
Open Paper

Magic 1-For-1: Generating One Minute Video Clips within One Minute

Hongwei Yi, Shitong Shao, Tian Ye, Jiantong Zhao, Qingyu Yin, Michael Lingelbach, Li Yuan, Yonghong Tian, Enze Xie, Daquan Zhou

2025 · arxiv.org
Open Paper

MagicDistillation: Weak-to-Strong Video Distillation for Large-Scale Few-Step Synthesis

S Shao, H Yi, H Guo, T Ye, D Zhou

2025 · arxiv.org
Open Paper

Neodragon: Mobile Video Generation using Diffusion Transformer

Animesh Karnewar, Denis Korzhenkov, Ioannis Lelekas, Adil Karjauv, Noor Fathima, Hanwen Xiong, Vancheeswaran Vaidyanathan, Will Zeng, Rafael Esteves, Tushar Singhal, others

2025 · arxiv.org
Open Paper

Seedance 1.0: Exploring the Boundaries of Video Generation Models

Y Gao, H Guo, T Hoang, W Huang, L Jiang

2025 · arxiv.org
Open Paper

Worldplay: Towards long-term geometric consistency for real-time interactive world modeling

Wenqiang Sun, Haiyu Zhang, Haoyuan Wang, Junta Wu, Zehan Wang, Zhenwei Wang, Yunhong Wang, Jun Zhang, Tengfei Wang, Chunchao Guo

2025 · arxiv.org
Open Paper

Accelerating Video Diffusion Models via Distribution Matching

Yuanzhi Zhu, Hanshu Yan, Huan Yang, Kai Zhang, Junnan Li

2024 · arxiv.org
Open Paper

Dreamr: Diffusion-driven counterfactual explanation for functional mri

HA Bedel, T Çukur

2024 · IEEE Transactions on Medical Imaging
Open Paper

Diffusiontalker: Personalization and acceleration for speech-driven 3d face diffuser

P Chen, X Wei, M Lu, Y Zhu, N Yao, X Xiao

2023 · arxiv.org
Open Paper

MagicVideo: Efficient Video Generation With Latent Diffusion Models

Daquan Zhou, Weimin Wang, Hanshu Yan, Weiwei Lv, Yizhe Zhu, Jiashi Feng

2022 · arxiv.org
Open Paper

Consistency Distillation

AdaDiff: Adaptive Step Selection for Fast Diffusion Models

Hui Zhang, Zuxuan Wu, Zhen Xing, Jie Shao, Yu-Gang Jiang

2025 · AAAI Conference on Artificial Intelligence
Open Paper

DCM: Dual-Expert Consistency Model for Efficient and High-Quality Video Generation

Zhengyao Lv, Chenyang Si, Tianlin Pan, Zhaoxi Chen, Kwan-Yee K Wong, Yu Qiao, Ziwei Liu

2025 · arxiv.org
Open Paper

DOVE: Efficient One-Step Diffusion Model for Real-World Video Super-Resolution

Zheng Chen, Zichen Zou, Kewei Zhang, Xiongfei Su, Xin Yuan, Yong Guo, Yulun Zhang

2025 · arxiv.org
Open Paper

GFix: Perceptually Enhanced Gaussian Splatting Video Compression

S Teng, G Gao, D Danier, Y Jiang, F Zhang

2025 · arxiv.org
Open Paper

Improved training technique for latent consistency models

Q Dao, K Doan, D Liu, T Le, D Metaxas

2025 · arxiv.org
Open Paper

MobileI2V: Fast and High-Resolution Image-to-Video on Mobile Devices

Shuai Zhang, Bao Tang, Siyuan Yu, Yueting Zhu, Jingfeng Yao, Ya Zou, Shanglin Yuan, Li Yu, Wenyu Liu, Xinggang Wang

2025 · arxiv.org
Open Paper

SwiftVideo: A Unified Framework for Few-Step Video Generation through Trajectory-Distribution Alignment

Yanxiao Sun, Jiafu Wu, Yun Cao, Chengming Xu, Yabiao Wang, Weijian Cao, Donghao Luo, Chengjie Wang, Yanwei Fu

2025 · arxiv.org
Open Paper

Taming Consistency Distillation for Accelerated Human Image Animation

Xiang Wang, Shiwei Zhang, Hangjie Yuan, Yujie Wei, Yingya Zhang, Changxin Gao, Yuehuan Wang, Nong Sang

2025 · arxiv.org
Open Paper

UltraVSR: Achieving Ultra-Realistic Video Super-Resolution with Efficient One-Step Diffusion Space

Yong Liu, Jinshan Pan, Yinchuan Li, Qingji Dong, Chao Zhu, Yu Guo, Fei Wang

2025 · Proceedings of the 33rd ACM International Conference on Multimedia
Open Paper

Vividface: High-quality and efficient one-step diffusion for video face enhancement

S Zhang, Y Guo, L Peng, Z Wang, Y Chen, W Li

2025 · arxiv.org
Open Paper

DOLLAR: Few-Step Video Generation via Distillation and Latent Reward Optimization

Zihan Ding, Chi Jin, Difan Liu, Haitian Zheng, Krishna Kumar Singh, Qiang Zhang, Yan Kang, Zhe Lin, Yuchen Liu

2024 · arxiv.org
Open Paper

Efficient Text-driven Motion Generation via Latent Consistency Training

M Hu, M Zhu, X Zhou, Q Yan, S Li, C Liu

2024 · arxiv.org
Open Paper

FastVideoEdit: Leveraging Consistency Models for Efficient Text-to-Video Editing

Youyuan Zhang, Xuan Ju, James J. Clark

2024 · IEEE Workshop/Winter Conference on Applications of Computer Vision
Open Paper

Lm2d: Lyrics-and music-driven dance synthesis

W Yin, X Zhao, Y Yu, H Yin, D Kragic

2024 · arxiv.org
Open Paper

Motion Consistency Model: Accelerating Video Diffusion with Disentangled Motion-Appearance Distillation

Yuanhao Zhai, Kevin Lin, Zhengyuan Yang, Linjie Li, Jianfeng Wang, Chung-Ching Lin, David Doermann, Junsong Yuan, Lijuan Wang

2024 · Neural Information Processing Systems
Open Paper

Motionlcm: Real-time controllable motion generation via latent consistency model

W Dai, LH Chen, J Wang, J Liu, B Dai

2024 · Computer Vision – ECCV 2024
Open Paper

OSV: One Step is Enough for High-Quality Image to Video Generation

Xiaofeng Mao, Zhengkai Jiang, Fu-yun Wang, Jiangning Zhang, Hao Chen, Mingmin Chi, Yabiao Wang, Wenhan Luo

2024 · Computer Vision and Pattern Recognition
Open Paper

Phased consistency models

Fu-Yun Wang, Zhaoyang Huang, Alexander Bergman, Dazhong Shen, Peng Gao, Michael Lingelbach, Keqiang Sun, Weikang Bian, Guanglu Song, Yu Liu, others

2024 · Neural Information Processing Systems
Open Paper

Single Trajectory Distillation for Accelerating Image and Video Style Transfer

Sijie Xu, Runqi Wang, Wei Zhu, Dejia Song, Nemo Chen, Xu Tang, Yao Hu

2024 · Proceedings of the 33rd ACM International Conference on Multimedia
Open Paper

T2V-Turbo: Breaking the Quality Bottleneck of Video Consistency Model with Mixed Reward Feedback

Jiachen Li, Weixi Feng, Tsu-Jui Fu, Xinyi Wang, Sugato Basu, Wenhu Chen, William Yang Wang

2024 · Neural Information Processing Systems
Open Paper

T2V-Turbo-v2: Enhancing Video Generation Model Post-Training through Data, Reward, and Conditional Guidance Design

Jiachen Li, Qian Long, Jian Zheng, Xiaofeng Gao, Robinson Piramuthu, Wenhu Chen, William Yang Wang

2024 · arxiv.org
Open Paper

Timestep Embedding Tells: It’s Time to Cache for Video Diffusion Model

Feng Liu, Shiwei Zhang, Xiaofeng Wang, Yujie Wei, Haonan Qiu, Yuzhong Zhao, Yingya Zhang, Qixiang Ye, Fang Wan

2024 · Computer Vision and Pattern Recognition
Open Paper

VideoLCM: Video Latent Consistency Model

Xiang Wang, Shiwei Zhang, Han Zhang, Yu Liu, Yingya Zhang, Changxin Gao, Nong Sang

2023 · arxiv.org
Open Paper
Adversarial Distillation

Combined Distillation

AccVideo: Accelerating Video Diffusion Model with Synthetic Dataset

Haiyu Zhang, Xinyuan Chen, Yaohui Wang, Xihui Liu, Yunhong Wang, Yu Qiao

2025 · arxiv.org
Open Paper

Adversarial Distribution Matching for Diffusion Distillation Towards Efficient Image and Video Synthesis

Yanzuo Lu, Yuxi Ren, Xin Xia, Shanchuan Lin, Xing Wang, Xuefeng Xiao, Andy J. Ma, Xiaohua Xie, Jian-Huang Lai

2025 · arxiv.org
Open Paper

Large scale diffusion distillation via score-regularized continuous-time consistency

Kaiwen Zheng, Yuji Wang, Qianli Ma, Huayu Chen, Jintao Zhang, Yogesh Balaji, Jianfei Chen, Ming-Yu Liu, Jun Zhu, Qinsheng Zhang

2025 · arxiv.org
Open Paper

LLIA--Enabling Low-Latency Interactive Avatars: Real-Time Audio-Driven Portrait Video Generation with Diffusion Models

H Yu, Z Wang, Y Pan, M Cheng, H Yang

2025 · arxiv.org
Open Paper

Pose: Phased one-step adversarial equilibrium for video diffusion models

Jiaxiang Cheng, Bing Ma, Xuhua Ren, Hongyi Jin, Kai Yu, Peng Zhang, Wenyue Li, Yuan Zhou, Tianxiang Zheng, Qinglin Lu

2025 · arxiv.org
Open Paper

TurboDiffusion: Accelerating Video Diffusion Models by 100-200 Times

J Zhang, K Zheng, K Jiang, H Wang, I Stoica

2025 · arxiv.org
Open Paper

AnimateDiff-Lightning: Cross-Model Diffusion Distillation

Shanchuan Lin, Xiao Yang

2024 · arxiv.org
Open Paper

MoViE: Mobile Diffusion for Video Editing

Adil Karjauv, Noor Fathima, Ioannis Lelekas, Fatih Porikli, Amir Ghodrati, Amirhossein Habibian

2024 · arxiv.org
Open Paper

Independent Distillation

MoGAN: Improving Motion Quality in Video Diffusion via Few-Step Motion Adversarial Post-Training

H Xue, Q Chen, Z Wang, X Huang

2025 · arxiv.org
Open Paper

MotionPCM: Real-Time Motion Synthesis with Phased Consistency Model

L Jiang, Y Wei, H Ni

2025 · arxiv.org
Open Paper

Real-time One-Step Diffusion-based Expressive Portrait Videos Generation

H Guo, H Yi, D Zhou, AW Bergman

2024 · arxiv.org
Open Paper

Efficient Attention

Sparse Attention

Dynamic Sparsity

Bidirectional Sparse Attention for Faster Video Diffusion Training

Chenlu Zhan, Wen Li, Chuyu Shen, Jun Zhang, Suhui Wu, Hao Zhang

2025 · arxiv.org
Open Paper

Diffusion Adversarial Post-Training for One-Step Video Generation

Shanchuan Lin, Xin Xia, Yuxi Ren, Ceyuan Yang, Xuefeng Xiao, Lu Jiang

2025 · International Conference on Machine Learning
Open Paper

DraftAttention: Fast Video Diffusion via Low-Resolution Attention Guidance

Xuan Shen, Chenxia Han, Yufa Zhou, Yanyue Xie, Yifan Gong, Quanyi Wang, Yiwei Wang, Yanzhi Wang, Pu Zhao, Jiuxiang Gu

2025 · arxiv.org
Open Paper

FG-Attn: Leveraging Fine-Grained Sparsity In Diffusion Transformers

Sankeerth Durvasula, Kavya Sreedhar, Zain Moustafa, Suraj Kothawade, Ashish Gondimalla, Suvinay Subramanian, Narges Shahidi, Nandita Vijaykumar

2025 · arxiv.org
Open Paper

FlashOmni: A Unified Sparse Attention Engine for Diffusion Transformers

L Qiao, Y Dai, Y Huang, H Kan, J Shi, H An

2025 · arxiv.org
Open Paper

Input-Aware Sparse Attention for Real-Time Co-Speech Video Generation

Beijia Lu, Ziyi Chen, Jing Xiao, Jun-Yan Zhu

2025 · arxiv.org
Open Paper

LiteAttention: A Temporal Sparse Attention for Diffusion Transformers

Dor Shmilovich, Tony Wu, Aviad Dahan, Yuval Domb

2025 · arxiv.org
Open Paper

Mixture of Contexts for Long Video Generation

Shengqu Cai, Ceyuan Yang, Lvmin Zhang, Yuwei Guo, Junfei Xiao, Ziyan Yang, Yinghao Xu, Zhenheng Yang, Alan Yuille, Leonidas Guibas, Maneesh Agrawala, Lu Jiang, Gordon Wetzstein

2025 · arxiv.org
Open Paper

MoGA: Mixture-of-Groups Attention for End-to-End Long Video Generation

Weinan Jia, Yuning Lu, Mengqi Huang, Hualiang Wang, Binyuan Huang, Nan Chen, Mu Liu, Jidong Jiang, Zhendong Mao

2025 · arxiv.org
Open Paper

OS-DiffVSR: Towards One-step Latent Diffusion Model for High-detailed Real-world Video Super-Resolution

Hanting Li, Huaao Tang, Jianhong Han, Tianxiong Zhou, Jiulong Cui, Haizhen Xie, Yan Chen, Jie Hu

2025 · arxiv.org
Open Paper

RainFusion2. 0: Temporal-Spatial Awareness and Hardware-Efficient Block-wise Sparse Attention

A Chen, Y Liu, J Huang, G Lian, Y Yao, W Lan

2025 · arxiv.org
Open Paper

Re-ttention: Ultra Sparse Visual Generation via Attention Statistical Reshape

Ruichen Chen, Keith G. Mills, Liyao Jiang, Chao Gao, Di Niu

2025 · arxiv.org
Open Paper

SpargeAttention: Accurate and Training-free Sparse Attention Accelerating Any Model Inference

Jintao Zhang, Chendong Xiang, Haofeng Huang, Haocheng Xi, Jun Zhu, Jianfei Chen, others

2025 · Forty-second International Conference on Machine Learning
Open Paper

Spargeattn: Accurate sparse attention accelerating any model inference

J Zhang, C Xiang, H Huang, J Wei, H Xi, J Zhu

2025 · arxiv.org
Open Paper

Sparse VideoGen: Accelerating Video Diffusion Transformers with Spatial-Temporal Sparsity

Haocheng Xi, Shuo Yang, Yilong Zhao, Chenfeng Xu, Muyang Li, Xiuyu Li, Yujun Lin, Han Cai, Jintao Zhang, Dacheng Li, Jianfei Chen, Ion Stoica, Kurt Keutzer, Song Han

2025 · International Conference on Machine Learning
Open Paper

Sparse VideoGen2: Accelerate Video Generation with Sparse Attention via Semantic-Aware Permutation

Shuo Yang, Haocheng Xi, Yilong Zhao, Muyang Li, Jintao Zhang, Han Cai, Yujun Lin, Xiuyu Li, Chenfeng Xu, Kelly Peng, Jianfei Chen, Song Han, Kurt Keutzer, Ion Stoica

2025 · arxiv.org
Open Paper

Training-free and Adaptive Sparse Attention for Efficient Long Video Generation

Yifei Xia, Suhan Ling, Fangcheng Fu, Yujie Wang, Huixia Li, Xuefeng Xiao, Bin Cui

2025 · arxiv.org
Open Paper

Training-Free Efficient Video Generation via Dynamic Token Carving

Y Zhang, J Xing, B Xia, S Liu, B Peng, X Tao

2025 · arxiv.org
Open Paper

Understanding Attention Mechanism in Video Diffusion Models

Bingyan Liu, Chengyu Wang, Tongtong Su, Huan Ten, Jun Huang, Kailing Guo, Kui Jia

2025 · arxiv.org
Open Paper

USV: Unified Sparsification for Accelerating Video Diffusion Models

Xinjian Wu, Hongmei Wang, Yuan Zhou, Qinglin Lu

2025 · arxiv.org
Open Paper

VMoBA: Mixture-of-Block Attention for Video Diffusion Models

Jianzong Wu, Liang Hou, Haotian Yang, Xin Tao, Ye Tian, Pengfei Wan, Di Zhang, Yunhai Tong

2025 · arxiv.org
Open Paper

VORTA: Efficient Video Diffusion via Routing Sparse Attention

Wenhao Sun, Rong-Cheng Tu, Yifu Ding, Zhao Jin, Jingyi Liao, Shunyu Liu, Dacheng Tao

2025 · arxiv.org
Open Paper

VSA: Faster Video Diffusion with Trainable Sparse Attention

Peiyuan Zhang, Yongqi Chen, Haofeng Huang, Will Lin, Zhengzhong Liu, Ion Stoica, Eric Xing, Hao Zhang

2025 · arxiv.org
Open Paper

Xattention: Block sparse attention with antidiagonal scoring

R Xu, G Xiao, H Huang, J Guo, S Han

2025 · arxiv.org
Open Paper

Object-centric diffusion for efficient video editing

K Kahatapitiya, A Karjauv, D Abati, F Porikli

2024 · Computer Vision – ECCV 2024
Open Paper

Open-Sora: Democratizing Efficient Video Production for All

Zangwei Zheng, Xiangyu Peng, Tianji Yang, Chenhui Shen, Shenggui Li, Hongxin Liu, Yukun Zhou, Tianyi Li, Yang You

2024 · arxiv.org
Open Paper

SF-V: Single Forward Video Generation Model

Zhixing Zhang, Yanyu Li, Yushu Wu, Yanwu Xu, Anil Kag, Ivan Skorokhodov, Willi Menapace, Aliaksandr Siarohin, Junli Cao, Dimitris Metaxas, Sergey Tulyakov, Jian Ren

2024 · Neural Information Processing Systems
Open Paper

ControlVideo: Training-free Controllable Text-to-Video Generation

Yabo Zhang, Yuxiang Wei, Dongsheng Jiang, Xiaopeng Zhang, Wangmeng Zuo, Qi Tian

2023 · International Conference on Learning Representations
Open Paper

Static Sparsity

Attention surgery: An efficient recipe to linearize your video diffusion transformer

Mohsen Ghafoorian, Denis Korzhenkov, Amirhossein Habibian

2025 · arxiv.org
Open Paper

Ditvr: Zero-shot diffusion transformer for video restoration

S Gao, N Mehta, Z Wu, R Timofte

2025 · arxiv.org
Open Paper

Efficient-vDiT: Efficient Video Diffusion Transformers With Attention Tile

Hangliang Ding, Dacheng Li, Runlong Su, Peiyuan Zhang, Zhijie Deng, Ion Stoica, Hao Zhang

2025 · arxiv.org
Open Paper

Generalized Neighborhood Attention: Multi-dimensional Sparse Attention at the Speed of Light

A Hassani, F Zhou, A Kane, J Huang, CY Chen

2025 · arxiv.org
Open Paper

Holocine: Holistic generation of cinematic multi-shot long video narratives

Y Meng, H Ouyang, Y Yu, Q Wang, W Wang

2025 · arxiv.org
Open Paper

Longcat-video technical report

MLC Team, X Cai, Q Huang, Z Kang, H Li

2025 · arxiv.org
Open Paper

PAROAttention: Pattern-Aware ReOrdering for Efficient Sparse and Quantized Attention in Visual Generation Models

Tianchen Zhao, Ke Hong, Xinhao Yang, Xuefeng Xiao, Huixia Li, Feng Ling, Ruiqi Xie, Siqi Chen, Hongyu Zhu, Yichong Zhang, Yu Wang

2025 · arxiv.org
Open Paper

Radial Attention: $O(n\log n)$ Sparse Attention with Energy Decay for Long Video Generation

Xingyang Li, Muyang Li, Tianle Cai, Haocheng Xi, Shuo Yang, Yujun Lin, Lvmin Zhang, Songlin Yang, Jinbo Hu, Kelly Peng, Maneesh Agrawala, Ion Stoica, Kurt Keutzer, Song Han

2025 · arxiv.org
Open Paper

Radial Attention: Sparse Attention with Energy Decay for Long Video Generation

X Li, M Li, T Cai, H Xi, S Yang, Y Lin, L Zhang

2025 · arxiv.org
Open Paper

Sparse-vDiT: Unleashing the Power of Sparse Attention to Accelerate Video Diffusion Transformers

Pengtao Chen, Xianfang Zeng, Maosen Zhao, Peng Ye, Mingzhu Shen, Wei Cheng, Gang Yu, Tao Chen

2025 · arxiv.org
Open Paper

Motion-I2V: Consistent and Controllable Image-to-Video Generation with Explicit Motion Modeling

Xiaoyu Shi, Zhaoyang Huang, Fu-Yun Wang, Weikang Bian, Dasong Li, Y. Zhang, Manyuan Zhang, K. Cheung, Simon See, Hongwei Qin, Jifeng Da, Hongsheng Li

2024 · International Conference on Computer Graphics and Interactive Techniques
Open Paper

Open-sora plan: Open-source large video generation model

B Lin, Y Ge, X Cheng, Z Li, B Zhu, S Wang, X He

2024 · arxiv.org
Open Paper

Video Prediction by Modeling Videos as Continuous Multi-Dimensional Processes

Gaurav Shrivastava, Abhinav Shrivastava

2024 · Computer Vision and Pattern Recognition
Open Paper

Photorealistic Video Generation with Diffusion Models

Agrim Gupta, Lijun Yu, Kihyuk Sohn, Xiuye Gu, Meera Hahn, Li Fei-Fei, Irfan Essa, Lu Jiang, José Lezama

2023 · European Conference on Computer Vision
Open Paper

MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation

Ludan Ruan, Yiyang Ma, Huan Yang, Huiguo He, Bei Liu, Jianlong Fu, Nicholas Jing Yuan, Qin Jin, Baining Guo

2022 · Computer Vision and Pattern Recognition
Open Paper

Scalable adaptive computation for iterative generation

A Jabri, D Fleet, T Chen

2022 · arxiv.org
Open Paper

Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation

Jay Zhangjie Wu, Yixiao Ge, Xintao Wang, Stan Weixian Lei, Yuchao Gu, Yufei Shi, Wynne Hsu, Ying Shan, Xiaohu Qie, Mike Zheng Shou

2022 · IEEE International Conference on Computer Vision
Open Paper

SwinBERT: End-to-End Transformers with Sparse Attention for Video Captioning

Kevin Lin, Linjie Li, Chung-Ching Lin, Faisal Ahmed, Zhe Gan, Zicheng Liu, Yumao Lu, Lijuan Wang

2021 · Computer Vision and Pattern Recognition
Open Paper
Linear Attention

Training-Based

ReHyAt: Recurrent Hybrid Attention for Video Diffusion Transformers

M Ghafoorian, A Habibian

2026 · arxiv.org
Open Paper

Consistent and Controllable Image Animation with Motion Linear Diffusion Transformers

X Ma, Y Wang, G Jia, X Chen, TT Wong

2025 · IEEE Transactions on Pattern Analysis and Machine Intelligence
Open Paper

Grouping First, Attending Smartly: Training-Free Acceleration for Diffusion Transformers

S Ren, Q Yu, J He, A Yuille, LC Chen

2025 · arxiv.org
Open Paper

Linvideo: A post-training framework towards o (n) attention in efficient video generation

Yushi Huang, Xingtong Ge, Ruihao Gong, Chengtao Lv, Jun Zhang

2025 · arxiv.org
Open Paper

SANA-Video: Efficient Video Generation with Block Linear Diffusion Transformer

Junsong Chen, Yuyang Zhao, Jincheng Yu, Ruihang Chu, Junyu Chen, Shuai Yang, Xianbang Wang, Yicheng Pan, Daquan Zhou, Huan Ling, Haozhe Liu, Hongwei Yi, Hao Zhang, Muyang Li, Yukang Chen, Han Cai, Sanja Fidler, Ping Luo, Song Han, Enze Xie

2025 · arxiv.org
Open Paper

SLA: Beyond Sparsity in Diffusion Transformers via Fine-Tunable Sparse-Linear Attention

Jintao Zhang, Haoxu Wang, Kai Jiang, Shuo Yang, Kaiwen Zheng, Haocheng Xi, Ziteng Wang, Hongzhou Zhu, Min Zhao, Ion Stoica, Joseph E. Gonzalez, Jun Zhu, Jianfei Chen

2025 · arxiv.org
Open Paper

Unianimate: Taming unified video diffusion models for consistent human image animation

X Wang, S Zhang, C Gao, J Wang, X Zhou

2025 · Science China Information Sciences
Open Paper

Efficient Long-duration Talking Video Synthesis with Linear Diffusion Transformer under Multimodal Guidance

H Zhang, Z Liang, R Fu, B Liu, Z Wen, X Liu

2024 · arxiv.org
Open Paper

MSC: Multi-Scale Spatio-Temporal Causal Attention for Autoregressive Video Diffusion

X Xu, M Cao

2024 · arxiv.org
Open Paper

Qihoo-t2x: An efficient proxy-tokenized diffusion transformer for text-to-any-task

J Wang, A Ma, J Feng, D Leng, Y Yin

2024 · arxiv.org
Open Paper

Zigma: A dit-style zigzag mamba diffusion model

VT Hu, SA Baumann, M Gui, O Grebenkova

2024 · Computer Vision – ECCV 2024
Open Paper

% Training-Free

Model Compression

Quantization

Quantization-Aware Training

FPSAttention: Training-Aware FP8 and Sparsity Co-Design for Fast Video Diffusion

Akide Liu, Zeyu Zhang, Zhexin Li, Xuehai Bai, Yizeng Han, Jiasheng Tang, Yuanjie Xing, Jichao Wu, Mingyang Yang, Weihua Chen, Jiahao He, Yuanyu He, Fan Wang, Gholamreza Haffari, Bohan Zhuang

2025 · arxiv.org
Open Paper

MANTA: Diffusion Mamba for Efficient and Effective Stochastic Long-Term Dense Anticipation

O Zatsarynna, E Bahrami, YA Farha

2025 · arxiv.org
Open Paper

Q-VDiT: Towards Accurate Quantization and Distillation of Video-Generation Diffusion Transformers

Weilun Feng, Chuanguang Yang, Haotong Qin, Xiangqi Li, Yu Wang, Zhulin An, Libo Huang, Boyu Diao, Zixiang Zhao, Yongjun Xu, Michele Magno

2025 · International Conference on Machine Learning
Open Paper

QuantSparse: Comprehensively Compressing Video Diffusion Transformer with Model Quantization and Attention Sparsification

Weilun Feng, Chuanguang Yang, Haotong Qin, Mingqiang Wu, Yuqi Li, Xiangqi Li, Zhulin An, Libo Huang, Yulun Zhang, Michele Magno, Yongjun Xu

2025 · arxiv.org
Open Paper

QVGen: Pushing the Limit of Quantized Video Generative Models

Yushi Huang, Ruihao Gong, Jing Liu, Yifu Ding, Chengtao Lv, Haotong Qin, Jun Zhang

2025 · arxiv.org
Open Paper

Optical flow representation alignment mamba diffusion model for medical video generation

Z Wang, L Zhang, L Wang, M Zhu, Z Zhang

2024 · arxiv.org
Open Paper

Scaling diffusion mamba with bidirectional ssms for efficient image and video generation

S Mo, Y Tian

2024 · arxiv.org
Open Paper

Post-Training Quantization

DVD-Quant: Data-free Video Diffusion Transformers Quantization

Zhiteng Li, Hanxuan Li, Junyi Wu, Kai Liu, Haotong Qin, Linghe Kong, Guihai Chen, Yulun Zhang, Xiaokang Yang

2025 · arxiv.org
Open Paper

LRQ-DiT: Log-Rotation Post-Training Quantization of Diffusion Transformers for Image and Video Generation

L Yang, H Lin, T Zhao, Y Wu, H Zhu, R Xie

2025 · arxiv.org
Open Paper

Mpq-dmv2: Flexible residual mixed precision quantization for low-bit diffusion models with temporal distillation

W Feng, C Yang, H Qin, Y Li, X Li, Z An

2025 · arxiv.org
Open Paper

PMQ-VE: Progressive Multi-Frame Quantization for Video Enhancement

ZF Feng, L Peng, X Di, Y Guo, W Li, Y Zhang

2025 · arxiv.org
Open Paper

S $\^{

Weilun Feng, Haotong Qin, Chuanguang Yang, Xiangqi Li, Han Yang, Yuqi Li, Zhulin An, Libo Huang, Michele Magno, Yongjun Xu

2025 · arxiv.org
Open Paper

Sageattention3: Microscaling fp4 attention for inference and an exploration of 8-bit training

J Zhang, J Wei, P Zhang, X Xu, H Huang

2025 · arxiv.org
Open Paper

SQ-VDiT: Accurate Quantized Video Diffusion Transformer with Salient Data and Sparse Token Distillation

W Feng, H Qin, C Yang, X Li, H Yang, Y Li, Z An

2025 · arxiv.org
Open Paper

A Simple Low-bit Quantization Framework for Video Snapshot Compressive Imaging

M Cao, L Wang, H Wang, X Yuan

2024 · Computer Vision – ECCV 2024
Open Paper

Q-DiT: Accurate Post-Training Quantization for Diffusion Transformers

Lei Chen, Yuan Meng, Chen Tang, Xinzhu Ma, Jingyan Jiang, Xin Wang, Zhi Wang, Wenwu Zhu

2024 · Computer Vision and Pattern Recognition
Open Paper

Sageattention: Accurate 8-bit attention for plug-and-play inference acceleration

J Zhang, J Wei, H Huang, P Zhang, J Zhu

2024 · arxiv.org
Open Paper

Sageattention2: Efficient attention with thorough outlier smoothing and per-thread int4 quantization

J Zhang, H Huang, P Zhang, J Wei, J Zhu

2024 · arxiv.org
Open Paper

TaQ-DiT: Time-aware Quantization for Diffusion Transformers

X Liu, H Shi, Y Xu, Z Wang

2024 · IEEE Transactions on Circuits and Systems for Video Technology
Open Paper

ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for Image and Video Generation

Tianchen Zhao, Tongcheng Fang, Haofeng Huang, Enshu Liu, Rui Wan, Widyadewi Soedarmadji, Shiyao Li, Zinan Lin, Guohao Dai, Shengen Yan, Huazhong Yang, Xuefei Ning, Yu Wang

2024 · International Conference on Learning Representations
Open Paper

VAE Compression

CLQ: Cross-Layer Guided Orthogonal-based Quantization for Diffusion Transformers

K Liu, S Zhang, L Kong, Y Zhang

2025 · arxiv.org
Open Paper

Improved Video VAE for Latent Video Diffusion Model

Pingyu Wu, Kai Zhu, Yu Liu, Liming Zhao, Wei Zhai, Yang Cao, Zheng-Jun Zha

2025 · Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
Open Paper

LeanVAE: An Ultra-Efficient Reconstruction VAE for Video Diffusion Models

Y Cheng, F Yuan

2025 · arxiv.org
Open Paper

QuantVSR: Low-Bit Post-Training Quantization for Real-World Video Super-Resolution

B Chai, Z Chen, L Zhu, W Li, Y Guo, Y Zhang

2025 · arxiv.org
Open Paper

TR-DQ: Time-Rotation Diffusion Quantization

Yihua Shao, Deyang Lin, Fanhu Zeng, Minxi Yan, Muyang Zhang, Siyu Chen, Yuxuan Fan, Ziyang Yan, Haozhe Wang, Jingcai Guo, Yan Wang, Haotong Qin, Hao Tang

2025 · arxiv.org
Open Paper

CV-VAE: A Compatible Video VAE for Latent Generative Video Models

Sijie Zhao, Yong Zhang, Xiaodong Cun, Shaoshu Yang, Muyao Niu, Xiaoyu Li, Wenbo Hu, Ying Shan

2024 · arxiv.org
Open Paper

OD-VAE: An Omni-dimensional Video Compressor for Improving Latent Video Diffusion Model

Liuhan Chen, Zongjian Li, Bin Lin, Bin Zhu, Qian Wang, Shenghai Yuan, Xing Zhou, Xinhua Cheng, Li Yuan

2024 · arxiv.org
Open Paper

WF-VAE: Enhancing Video VAE by Wavelet-Driven Energy Flow for Latent Video Diffusion Model

Zongjian Li, Bin Lin, Yang Ye, Liuhan Chen, Xinhua Cheng, Shenghai Yuan, Li Yuan

2024 · arxiv.org
Open Paper
Pruning

Token Pruning

Astraea: A GPU-Oriented Token-wise Acceleration Framework for Video Diffusion Transformers

H Liu, Y Cheng, Z Liu, A Chen, Y Yao, C Chen

2025 · arxiv.org
Open Paper

Dc-videogen: Efficient video generation with deep compression video autoencoder

J Chen, W He, Y Gu, Y Zhao, J Yu, J Chen

2025 · arxiv.org
Open Paper

Fulldit2: Efficient in-context conditioning for video diffusion transformers

X He, Q Liu, Z Ye, W Ye, Q Wang, X Wang

2025 · arxiv.org
Open Paper

Hi-VAE: Efficient Video Autoencoding with Global and Detailed Motion

Huaize Liu, Wenzhang Sun, Qiyuan Zhang, Donglin Di, Biao Gong, Hao Li, Chen Wei, Changqing Zou

2025 · arxiv.org
Open Paper

Long-context autoregressive video modeling with next-frame prediction

Y Gu, W Mao, MZ Shou

2025 · arxiv.org
Open Paper

Packing input frame context in next-frame prediction models for video generation

L Zhang, M Agrawala

2025 · arxiv.org
Open Paper

VGDFR: Diffusion-based Video Generation with Dynamic Latent Frame Rate

Z Yuan, R Xie, Y Shang, H Zhang, S Wang

2025 · arxiv.org
Open Paper

VidTwin: Video VAE with Decoupled Structure and Dynamics

Yuchi Wang, Junliang Guo, Xinyi Xie, Tianyu He, Xu Sun, Jiang Bian

2025 · Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
Open Paper

AsymRnR: Video Diffusion Transformers Acceleration with Asymmetric Reduction and Restoration

Wenhao Sun, Rong-Cheng Tu, Jingyi Liao, Zhao Jin, Dacheng Tao

2024 · International Conference on Machine Learning
Open Paper

Rethinking Video ViTs: Sparse Video Tubes for Joint Image and Video Learning

AJ Piergiovanni, Weicheng Kuo, Anelia Angelova

2022 · Computer Vision and Pattern Recognition
Open Paper

Channel Pruning

Efficient Diffusion-Based 3D Human Pose Estimation with Hierarchical Temporal Pruning

Y Bi, H Wang, X Shi, Z Gui, J Gui, YY Tang

2025 · IEEE Transactions on Circuits and Systems for Video Technology
Open Paper

Model Pruning

Fastvid: Dynamic density pruning for fast video large language models

L Shen, G Gong, T He, Y Zhang, P Liu, S Zhao

2025 · arxiv.org
Open Paper

Script: Graph-Structured and Query-Conditioned Semantic Token Pruning for Multimodal Large Language Models

Z Yang, D Xu, W Pang, Y Yuan

2025 · arxiv.org
Open Paper

Taming Diffusion Transformer for Efficient Mobile Video Generation in Seconds

Y Wu, Y Li, A Kag, I Skorokhodov, W Menapace

2025 · arxiv.org
Open Paper

Vip: Iterative online preference distillation for efficient video diffusion models

J Kim, W Seo, J Kim, S Park

2025 · Proceedings of the IEEE/CVF International Conference on Computer Vision
Open Paper

Animated Stickers: Bringing Stickers to Life with Video Diffusion

D Yan, W Zhang, L Zhang, A Kalia, D Wang

2024 · arxiv.org
Open Paper

Individual content and motion dynamics preserved pruning for video diffusion models

Y Wu, Z Chen, H Wang, D Xu

2024 · arxiv.org
Open Paper

Mobile Video Diffusion

Haitam Ben Yahia, Denis Korzhenkov, Ioannis Lelekas, Amir Ghodrati, Amirhossein Habibian

2024 · arxiv.org
Open Paper

Cache and Trajectory Optimization

Cache

Feature Cache

Asymmetric VAE for One-Step Video Super-Resolution Acceleration

J Li, Y Guo, Y Zhang, X Yang

2025 · arxiv.org
Open Paper

Block-wise Adaptive Caching for Accelerating Diffusion Policy

K Ji, Y Meng, H Cui, Y Li, S Hua, L Chen

2025 · arxiv.org
Open Paper

BWCache: Accelerating Video Diffusion Transformers through Block-Wise Caching

H Cui, Z Tang, Z Xu, Z Yao, W Zeng, W Jia

2025 · arxiv.org
Open Paper

Dicache: Let diffusion model determine its own cache

J Bu, P Ling, Y Zhou, Y Wang, Y Zang, D Lin

2025 · arxiv.org
Open Paper

DriveGen3D: Boosting Feed-Forward Driving Scene Generation with Efficient Video Diffusion

W Wang, J Zhu, Z Zhang, X Wang, Z Zhu

2025 · arxiv.org
Open Paper

Enerverse: Envisioning embodied future space for robotics manipulation

S Huang, L Chen, P Zhou, S Chen, Z Jiang

2025 · arxiv.org
Open Paper

Ertacache: Error rectification and timesteps adjustment for efficient diffusion

X Peng, H Liu, C Yan, R Ma, F Chen, X Wang

2025 · arxiv.org
Open Paper

Evctrl: Efficient control adapter for visual generation

Z Yang, Y Ma, Y Zhang, S Mo, D Liu

2025 · arxiv.org
Open Paper

Fastcache: Fast caching for diffusion transformer through learnable linear approximation

D Liu, Y Yu, J Zhang, Y Li, B Lengerich

2025 · arxiv.org
Open Paper

Forecast then calibrate: Feature caching as ode for efficient diffusion transformers

S Zheng, L Feng, X Wang, Q Zhou, P Cai, C Zou

2025 · arxiv.org
Open Paper

Foresight: Adaptive Layer Reuse for Accelerated and High-Quality Text-to-Video Generation

M Adnan, N Kurella, A Arunkumar, PJ Nair

2025 · arxiv.org
Open Paper

From reusing to forecasting: Accelerating diffusion models with taylorseers

J Liu, C Zou, Y Lyu, J Chen, L Zhang

2025 · arxiv.org
Open Paper

Hero: Hierarchical extrapolation and refresh for efficient world models

Q Song, X Wang, D Zhou, J Lin, C Chen, Y Ma

2025 · arxiv.org
Open Paper

Hicache: Training-free acceleration of diffusion models via hermite polynomial-based feature caching

L Feng, S Zheng, J Liu, Y Lin, Q Zhou, P Cai

2025 · arxiv.org
Open Paper

Less is Enough: Training-Free Video Diffusion Acceleration via Runtime-Adaptive Caching

Xin Zhou, Dingkang Liang, Kaijin Chen, Tianrui Feng, Xiwu Chen, Hongkai Lin, Yikang Ding, Feiyang Tan, Hengshuang Zhao, Xiang Bai

2025 · arxiv.org
Open Paper

Let Features Decide Their Own Solvers: Hybrid Feature Caching for Diffusion Transformers

S Zheng, G Chen, Q Zhou, Y Lin, L He, C Zou

2025 · arxiv.org
Open Paper

LightCache: Memory-Efficient, Training-Free Acceleration for Video Generation

Yang Xiao, Gen Li, Kaiyuan Deng, Yushu Wu, Zheng Zhan, Yanzhi Wang, Xiaolong Ma, Bo Hui

2025 · arxiv.org
Open Paper

MagCache: Fast Video Generation with Magnitude-Aware Cache

Z Ma, L Wei, F Wang, S Zhang, Q Tian

2025 · arxiv.org
Open Paper

MixCache: Mixture-of-Cache for Video Diffusion Transformer Acceleration

Y Wei, L Diao, B Chen, S Cheng, Z Qian, W Yu

2025 · arxiv.org
Open Paper

Quantcache: Adaptive importance-guided quantization with hierarchical latent and layer caching for video generation

J Wu, Z Li, Z Hui, Y Zhang, L Kong, X Yang

2025 · arxiv.org
Open Paper

Rethinking video tokenization: A conditioned diffusion-based approach

N Yang, P Li, L Zhao, Y Li, CW Xie, Y Tang

2025 · arxiv.org
Open Paper

Sortblock: Similarity-Aware Feature Reuse for Diffusion Model

H Chen, X Zhang, X Guan, L Jiang, G Wang

2025 · arxiv.org
Open Paper

SuperGen: An Efficient Ultra-high-resolution Video Generation System with Sketching and Tiling

F Ye, Z Zhao, Y Mu, J Shen, R Li, K Wang

2025 · arxiv.org
Open Paper

TaoCache: Structure-Maintained Video Generation Acceleration

Z Fan, Z Wang, W Zhang

2025 · arxiv.org
Open Paper

Turbo-vaed: Fast and stable transfer of video-vaes to mobile devices

Y Zou, J Yao, S Yu, S Zhang, W Liu, X Wang

2025 · arxiv.org
Open Paper

Accelerating diffusion transformers with dual feature caching

C Zou, E Zhang, R Guo, H Xu, C He, X Hu

2024 · arxiv.org
Open Paper

Accelerating diffusion transformers with token-wise feature caching

C Zou, X Liu, T Liu, S Huang, L Zhang

2024 · arxiv.org
Open Paper

Adaptive Caching for Faster Video Generation with Diffusion Transformers

Kumara Kahatapitiya, Haozhe Liu, Sen He, Ding Liu, Menglin Jia, Chenyang Zhang, Michael S. Ryoo, Tian Xie

2024 · arxiv.org
Open Paper

Dynamic Try-On: Taming Video Virtual Try-on with Dynamic Attention Mechanism

J Zheng, J Wang, F Zhao, X Zhang, X Liang

2024 · arxiv.org
Open Paper

F$^3$-Pruning: a training-free and generalized pruning strategy towards faster and finer text-to-video synthesis

Sitong Su, Jianzhi Liu, Lianli Gao, Jingkuan Song

2024 · Proceedings of the AAAI Conference on Artificial Intelligence
Open Paper

FasterCache: Training-Free Video Diffusion Model Acceleration with High Quality

Zhengyao Lv, Chenyang Si, Junhao Song, Zhenyu Yang, Yu Qiao, Ziwei Liu, Kwan-Yee K. Wong

2024 · International Conference on Learning Representations
Open Paper

FlexCache: Flexible approximate cache system for video diffusion

D Sun, H Tian, T Lu, S Liu

2024 · arxiv.org
Open Paper

Real-Time Video Generation with Pyramid Attention Broadcast

Xuanlei Zhao, Xiaolong Jin, Kai Wang, Yang You

2024 · International Conference on Learning Representations
Open Paper

Unveiling redundancy in diffusion transformers (dits): A systematic study

X Sun, J Fang, A Li, J Pan

2024 · arxiv.org
Open Paper

Pix2Video: Video Editing using Image Diffusion

Duygu Ceylan, Chun-Hao P. Huang, Niloy J. Mitra

2023 · IEEE International Conference on Computer Vision
Open Paper

KV Cache

PackCache: A Training-Free Acceleration Method for Unified Autoregressive Video Generation via Compact KV-Cache

K Li, M Shah, Y Shang

2026 · arxiv.org
Open Paper

Aquarius: A Family of Industry-Level Video Generation Models for Marketing Scenarios

H Shi, J Liang, R Xie, X Wu, C Chen, C Liu

2025 · arxiv.org
Open Paper

Causnvs: Autoregressive multi-view diffusion for flexible 3d novel view synthesis

X Kong, D Watson, Y Strümpler, M Niemeyer

2025 · arxiv.org
Open Paper

dvla: Diffusion vision-language-action model with multimodal chain-of-thought

J Wen, M Zhu, J Liu, Z Liu, Y Yang, L Zhang

2025 · arxiv.org
Open Paper

EgoLCD: Egocentric Video Generation with Long Context Diffusion

L Zhang, J Ye, Y Wang, M Zhong, M Cao, W Xia

2025 · arxiv.org
Open Paper

LiftVSR: Lifting Image Diffusion to Video Super-Resolution via Hybrid Temporal Modeling with Only 4RTX 4090s

X Wang, X Li, B Li, Z Chen

2025 · arxiv.org
Open Paper

Long context tuning for video generation

Y Guo, C Yang, Z Yang, Z Ma, Z Lin, Z Yang

2025 · arxiv.org
Open Paper

Lovic: Efficient long video generation with context compression

J Jiang, W Li, J Ren, Y Qiu, Y Guo, X Xu, H Wu

2025 · arxiv.org
Open Paper

MemFlow: Flowing Adaptive Memory for Consistent and Efficient Long Video Narratives

S Ji, X Chen, S Yang, X Tao, P Wan, H Zhao

2025 · arxiv.org
Open Paper

Physical autoregressive model for robotic manipulation without action pretraining

Z Song, S Qin, T Chen, L Lin, G Wang

2025 · arxiv.org
Open Paper

PoseGen: In-Context LoRA Finetuning for Pose-Controllable Long Human Video Generation

J He, B Su, F Wong

2025 · arxiv.org
Open Paper

Pretraining Frame Preservation in Autoregressive Video Memory Compression

L Zhang, S Cai, M Li, C Zeng, B Lu, A Rao

2025 · arxiv.org
Open Paper

Self-guidance: Boosting flow and diffusion generation on their own

T Li, W Luo, Z Chen, L Ma, GJ Qi

2025 · arxiv.org
Open Paper

Taming flow-based i2v models for creative video editing

X Kong, H Chen, Y Guo, L Zhang, G Wetzstein

2025 · arxiv.org
Open Paper

Vidarc: Embodied Video Diffusion Model for Closed-loop Control

Y Feng, C Xiang, X Mao, H Tan, Z Zhang

2025 · arxiv.org
Open Paper

VideoMAR: Autoregressive Video Generatio with Continuous Tokens

H Yu, B Gong, H Yuan, DD Zheng, W Chai

2025 · arxiv.org
Open Paper

Yan: Foundational interactive video generation

D Ye, F Zhou, J Lv, J Ma, J Zhang, J Lv, J Li

2025 · arxiv.org
Open Paper

Ca2-VDM: Efficient Autoregressive Video Diffusion Model with Causal Generation and Cache Sharing

Kaifeng Gao, Jiaxin Shi, Hanwang Zhang, Chunping Wang, Jun Xiao, Long Chen

2024 · International Conference on Machine Learning
Open Paper

Live2diff: Live stream translation via uni-directional attention in video diffusion models

Z Xing, G Fox, Y Zeng, X Pan, M Elgharib

2024 · arxiv.org
Open Paper

ViD-GPT: Introducing GPT-style Autoregressive Generation in Video Diffusion Models

Kaifeng Gao, Jiaxin Shi, Hanwang Zhang, Chunping Wang, Jun Xiao

2024 · arxiv.org
Open Paper
Latent Trajectory Tricks

Noise and State Modification

Long-context state-space video world models

R Po, Y Nitzan, R Zhang, B Chen, T Dao

2025 · arxiv.org
Open Paper

Pretraining Frame Preservation in Autoregressive Video Memory Compression

L Zhang, S Cai, M Li, C Zeng, B Lu, A Rao

2025 · arxiv.org
Open Paper

DiCoDe: Diffusion-Compressed Deep Tokens for Autoregressive Video Generation with Language Models

Y Li, Y Ge, Y Ge, P Luo, Y Shan

2024 · arxiv.org
Open Paper

FreeNoise: Tuning-Free Longer Video Diffusion via Noise Rescheduling

Haonan Qiu, Menghan Xia, Yong Zhang, Yingqing He, Xintao Wang, Ying Shan, Ziwei Liu

2023 · International Conference on Learning Representations
Open Paper

TokenFlow: Consistent Diffusion Features for Consistent Video Editing

Michal Geyer, Omer Bar-Tal, Shai Bagon, Tali Dekel

2023 · International Conference on Learning Representations
Open Paper

Trajectory Modification

Efficient Camera-Controlled Video Generation of Static Scenes via Sparse Diffusion and 3D Rendering

J Chen, J Hu, J Lasenby, A Tewari

2026 · arxiv.org
Open Paper

A Unit Enhancement and Guidance Framework for Audio-Driven Avatar Video Generation

SZ Zhou, YB Wang, JF Wu, T Hu, JN Zhang

2025 · arxiv.org
Open Paper

Accelerating Diffusion Sampling via Exploiting Local Transition Coherence

S Zhu, H Zhang, Z Yang, Q Peng, Z Pu, H Wang

2025 · arxiv.org
Open Paper

DiTPainter: Efficient Video Inpainting with Diffusion Transformers

X Wu, C Liu

2025 · arxiv.org
Open Paper

EC-Diff: Fast and High-Quality Edge-Cloud Collaborative Inference for Diffusion Models

J Xie, S Zhang, Z Zhao, F Wu, F Wu

2025 · arxiv.org
Open Paper

FlashVideo: Flowing Fidelity to Detail for Efficient High-Resolution Video Generation

Shilong Zhang, Wenbo Li, Shoufa Chen, Chongjian Ge, Peize Sun, Yida Zhang, Yi Jiang, Zehuan Yuan, Binyue Peng, Ping Luo

2025 · arxiv.org
Open Paper

Free4D: Tuning-free 4D Scene Generation with Spatial-Temporal Consistency

T Liu, Z Huang, Z Chen, G Wang, S Hu, L Shen

2025 · arxiv.org
Open Paper

GestureLSM: Latent Shortcut based Co-Speech Gesture Generation with Spatial-Temporal Modeling

P Liu, L Song, J Huang, H Liu, C Xu

2025 · arxiv.org
Open Paper

MultiMotion: Multi Subject Video Motion Transfer via Video Diffusion Transformer

P Liu, J Wang, Y Shen, S Mo, C Qi, Y Ma

2025 · arxiv.org
Open Paper

On-device Sora: Enabling Training-Free Diffusion-based Text-to-Video Generation for Mobile Devices

B Kim, K Lee, I Jeong, J Cheon, Y Lee

2025 · arxiv.org
Open Paper

One Attention, One Scale: Phase-Aligned Rotary Positional Embeddings for Mixed-Resolution Diffusion Transformer

H Wu, J Xu, Q Miao, D Samaras, H Le

2025 · arxiv.org
Open Paper

SRDiffusion: Accelerate Video Diffusion Inference via Sketching-Rendering Cooperation

S Cheng, Y Wei, L Diao, Y Liu, B Chen

2025 · arxiv.org
Open Paper

TimeStep Master: Asymmetrical Mixture of Timestep LoRA Experts for Versatile and Efficient Diffusion Models in Vision

S Zhuang, Y Guo, Y Ding, K Li, X Chen, Y Wang

2025 · arxiv.org
Open Paper

Training-free Diffusion Acceleration with Bottleneck Sampling

Ye Tian, Xin Xia, Yuxi Ren, Shanchuan Lin, Xing Wang, Xuefeng Xiao, Yunhai Tong, Ling Yang, Bin Cui

2025 · arxiv.org
Open Paper

Align your steps: Optimizing sampling schedules in diffusion models

A Sabour, S Fidler, K Kreis

2024 · arxiv.org
Open Paper

Arlon: Boosting diffusion transformers with autoregressive models for long video generation

Z Li, S Hu, S Liu, L Zhou, J Choi, L Meng, X Guo

2024 · arxiv.org
Open Paper

Efficient Video Diffusion Models via Content-Frame Motion-Latent Decomposition

Sihyun Yu, Weili Nie, De-An Huang, Boyi Li, Jinwoo Shin, Anima Anandkumar

2024 · International Conference on Learning Representations
Open Paper

Motionclone: Training-free motion cloning for controllable video generation

P Ling, J Bu, P Zhang, X Dong, Y Zang, T Wu

2024 · arxiv.org
Open Paper

Pyramidal Flow Matching for Efficient Video Generative Modeling

Yang Jin, Zhicheng Sun, Ningyuan Li, Kun Xu, Kun Xu, Hao Jiang, Nan Zhuang, Quzhe Huang, Yang Song, Yadong Mu, Zhouchen Lin

2024 · International Conference on Learning Representations
Open Paper

Single Trajectory Distillation for Accelerating Image and Video Style Transfer

Sijie Xu, Runqi Wang, Wei Zhu, Dejia Song, Nemo Chen, Xu Tang, Yao Hu

2024 · Proceedings of the 33rd ACM International Conference on Multimedia
Open Paper

Streaming diffusion policy: Fast policy synthesis with variable noise diffusion models

SH Høeg, Y Du, O Egeland

2024 · arxiv.org
Open Paper

Latent-Shift: Latent Diffusion with Temporal Shift for Efficient Text-to-Video Generation

Jie An, Songyang Zhang, Harry Yang, Sonal Gupta, Jia-Bin Huang, Jiebo Luo, Xi Yin

2023 · arxiv.org
Open Paper

Structure and Content-Guided Video Synthesis with Diffusion Models

Patrick Esser, Johnathan Chiu, Parmida Atighehchian, Jonathan Granskog, Anastasis Germanidis

2023 · IEEE International Conference on Computer Vision
Open Paper

Parallel Computation

Adaptive Begin-of-Video Tokens for Autoregressive Video Diffusion Models

T Cheng, Z Zhang, K Gao, J Xiao

2025 · arxiv.org
Open Paper

Block Cascading: Training Free Acceleration of Block-Causal Video Models

H Bandyopadhyay, N Pinnaparaju, R Entezari

2025 · arxiv.org
Open Paper

Inference-time text-to-video alignment with diffusion latent beam search

Y Oshima, M Suzuki, Y Matsuo, H Furuta

2025 · arxiv.org
Open Paper

Video latent flow matching: Optimal polynomial projections for video interpolation and extrapolation

Y Cao, Z Song, C Yang

2025 · arxiv.org
Open Paper

Open-Sora: Democratizing Efficient Video Production for All

Zangwei Zheng, Xiangyu Peng, Tianji Yang, Chenhui Shen, Shenggui Li, Hongxin Liu, Yukun Zhou, Tianyi Li, Yang You

2024 · arxiv.org
Open Paper

DiffCollage: Parallel Generation of Large Content with Diffusion Models

Qinsheng Zhang, Jiaming Song, Xun Huang, Yongxin Chen, Ming-Yu Liu

2023 · Computer Vision and Pattern Recognition
Open Paper

Other Efficiency Methods

db-SP: Accelerating Sparse Attention for Visual Generative Models with Dual-Balanced Sequence Parallelism

S Chen, K Hong, T Zhao, R Xie, Z Zhu, X Zhang

2025 · arxiv.org
Open Paper

PipeDiT: Accelerating Diffusion Transformers in Video Generation with Task Pipelining and Model Decoupling

S Wang, Q Wang, S Shi

2025 · arxiv.org
Open Paper

ProAV-DiT: A Projected Latent Diffusion Transformer for Efficient Synchronized Audio-Video Generation

J Sun, W Wang, M Sun, Y Yang, X Zhu, J Liu

2025 · arxiv.org
Open Paper

xDiT: an Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelism

J Fang, J Pan, X Sun, A Li, J Wang

2024 · arxiv.org
Open Paper

MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation

Ludan Ruan, Yiyang Ma, Huan Yang, Huiguo He, Bei Liu, Jianlong Fu, Nicholas Jing Yuan, Qin Jin, Baining Guo

2022 · Computer Vision and Pattern Recognition
Open Paper