LIVEditor-14B

Abstract

LIVEditor-14B targets efficient unified video editing. Instead of treating long video context with dense attention, it introduces In-Context Sparse Attention to retrieve and attend to the most relevant contextual tokens. The resulting editor keeps the in-context editing formulation while substantially reducing attention cost, making high-quality video editing faster and easier to deploy.

Highlights

A compact summary of the method and demo focus.

Unified Video Editing

A single editor handles diverse text-guided video editing cases in an in-context formulation.

In-Context Sparse Attention

ISA prunes redundant context and routes queries to relevant source-video tokens.

Fast Inference

The demo focuses on visual comparison results while the paper reports large attention-latency reductions.

Gallery

Representative visual comparisons for full-attention editing and ISA-accelerated editing.

Comparison Results

We compare source videos, full-attention editing, and ISA-accelerated editing results.

Full-Attention vs. ISA Comparisons

Each row is a side-by-side comparison of Source, Full Attention, and ISA under the same editing instruction.

SourceFull AttentionISA

Source vs. ISA Editing Results

Each pair shows the source/reference video and the edited result produced by ISA acceleration.

Source

ISA Result

Source

ISA Result

Source

ISA Result

Source

ISA Result

Source

ISA Result

Source

ISA Result

Source

ISA Result

Source

ISA Result

Source

ISA Result