Unified Video Editing
A single editor handles diverse text-guided video editing cases in an in-context formulation.
TL;DR: LIVEditor accelerates in-context video editing with In-Context Sparse Attention (ISA), enabling fast, unified, and high-fidelity video editing.
LIVEditor targets efficient unified video editing. Instead of treating long video context with dense attention, it introduces In-Context Sparse Attention to retrieve and attend to the most relevant contextual tokens. The resulting editor keeps the in-context editing formulation while substantially reducing attention cost, making high-quality video editing faster and easier to deploy.
A compact summary of the method and demo focus.
A single editor handles diverse text-guided video editing cases in an in-context formulation.
ISA prunes redundant context and routes queries to relevant source-video tokens.
The demo focuses on visual comparison results while the paper reports large attention-latency reductions.
Representative visual comparisons for full-attention editing and ISA-accelerated editing.
We compare source videos, full-attention editing, and ISA-accelerated editing results.
Each row is a side-by-side comparison of Source, Full Attention, and ISA under the same editing instruction.
Each pair shows the source/reference video and the edited result produced by ISA acceleration.