GitHub - DepthAnything/Video-Depth-Anything: [CVPR 2025 Highlight] Video Depth Anything: Consistent Depth Estimation for Super-Long Videos

CVPR 2025 Highlight that solves temporal flickering in video depth estimation through cross-frame attention, enabling consistent depth maps across thousands of frames without retraining for different video lengths.

Apr 4, 2026 · ai ml

Read Original

Video Depth Anything tackles the fundamental problem that existing depth estimation models—even state-of-the-art ones like Depth Anything v1/v2—create flickering artifacts when applied to video because they process frames independently. The authors argue that video depth requires video-native architecture, not just image models run frame-by-frame. Their solution uses cross-frame attention mechanisms that process videos in 8-frame chunks, maintaining temporal consistency through explicit modeling of inter-frame relationships.

The scale of their training data is unprecedented: 17.6M video clips containing 8.8B frames, compared to previous video depth datasets with only millions of frames. This massive dataset, combined with temporal consistency losses, enables the model to learn stable depth predictions across time. The architecture uses a sliding window approach that processes long videos in overlapping chunks, maintaining consistency at chunk boundaries. Critically, the model generalizes to arbitrary video lengths without retraining—you can feed it a 10-frame clip or a feature-length film.

The practical implications are significant for any application requiring depth estimation over time. Film and video production can now generate consistent depth maps for effects work without manual cleanup of flickering. AR/VR applications get stable depth for video content. Autonomous systems can maintain consistent spatial understanding across continuous operation. The team provides complete inference code, training code, and pre-trained models, making this immediately usable for production applications that were previously blocked by temporal inconsistency issues.

GitHub - DepthAnything/Video-Depth-Anything: [CVPR 2025 Highlight] Video Depth Anything: Consistent Depth Estimation for Super-Long Videos

TLDR

In Detail

TLDR

In Detail

Related