LIVEditor

Lightning Unified Video Editing via In-Context Sparse Attention

Shitong ShaoZikai ZhouHaopeng LiYingwei SongWenliang ZhongLichen BaiZeke Xie
ICML 2026 · Project Page
ICML 2026

TL;DR: LIVEditor accelerates in-context video editing with In-Context Sparse Attention (ISA), enabling fast, unified, and high-fidelity video editing.

Abstract

LIVEditor targets efficient unified video editing. Instead of treating long video context with dense attention, it introduces In-Context Sparse Attention to retrieve and attend to the most relevant contextual tokens. The resulting editor keeps the in-context editing formulation while substantially reducing attention cost, making high-quality video editing faster and easier to deploy.

Highlights

A compact summary of the method and demo focus.

Unified Video Editing

A single editor handles diverse text-guided video editing cases in an in-context formulation.

In-Context Sparse Attention

ISA prunes redundant context and routes queries to relevant source-video tokens.

Fast Inference

The demo focuses on visual comparison results while the paper reports large attention-latency reductions.

Comparison Results

We compare source videos, full-attention editing, and ISA-accelerated editing results.

Full-Attention vs. ISA Comparisons

Each row is a side-by-side comparison of Source, Full Attention, and ISA under the same editing instruction.

SourceFull AttentionISA
SourceFull AttentionISA
SourceFull AttentionISA
SourceFull AttentionISA
SourceFull AttentionISA
SourceFull AttentionISA

Source vs. ISA Editing Results

Each pair shows the source/reference video and the edited result produced by ISA acceleration.

Source
ISA Result
Source
ISA Result
Source
ISA Result
Source
ISA Result
Source
ISA Result
Source
ISA Result
Source
ISA Result
Source
ISA Result
Source
ISA Result