Veo 3.1 Ingredients to Video: New video generation model updates

Google's enhanced Veo 3.1 lets creators generate mobile-optimized videos by feeding it reference images, text prompts, and style inputs—shifting AI video from "describe it" to "show it" with unprecedented creative control.

Jan 14, 2026 · ai ml

Read Original

Google is introducing an enhanced version of Veo 3.1 "Ingredients to Video" that fundamentally changes how creators interact with AI video generation. Instead of relying solely on text descriptions, the model now accepts three types of "ingredients": reference images (for characters and objects), text prompts (for scenes and actions), and style references (for visual aesthetic). This multi-modal approach gives creators much more precise control over the output while maintaining consistency across frames.

The update specifically targets mobile content creators with native 9:16 vertical video optimization, addressing the growing demand for social media content. The reference image capability is particularly significant—it solves one of AI video's biggest challenges by maintaining character and object consistency throughout generated clips. Creators can now show the model exactly what they want rather than hoping text descriptions capture their vision. The style transfer feature adds another layer of creative control, letting users apply specific visual aesthetics like film noir or watercolor to their generated videos.

Technical improvements include better motion quality, more accurate prompt adherence, and more realistic physics. The model represents a shift from purely generative AI to a more collaborative tool where creators provide specific visual direction. This "show, don't just tell" approach makes AI video generation more practical for professional content creation, especially for creators who need consistent branding and specific visual styles across their mobile-first content.

Veo 3.1 Ingredients to Video: New video generation model updates

TLDR

In Detail

TLDR

In Detail

Related