What I've Learned in the Past Year Spent Building an AI Video Editor - Make Art with Python

• Video editors with AI features miss the point—the entire creation paradigm needs rethinking around LLMs and computer vision as collaborators, not tools
• Switched from Promptflow to Temporal for ML workflows because durable execution with automatic retries beats static DAGs when building generative pipelines
• Vector databases are overhyped—embeddings compress your data into generic spaces, often performing worse than traditional search with domain-specific weights (e.g., using Billboard charts to rank song searches)
• Metaprompting (having LLMs generate their own prompts) actually works, and Anthropic's Workbench helps iterate on prompt design faster
• Building something new means accepting detours—spent 6 months on rejected SBIR grants for pedestrian safety robotics while learning the messy reality of AI development

The author's journey from laid-off engineer to AI video editor builder reveals a crucial insight: adding AI features to existing video editing workflows fundamentally misses the opportunity. After six months building a local-first editor using SAM for object detection and diffusion models for animation, they realized the UI assumptions of traditional editors constrain what's possible. The real shift is reconceiving video as a "generator"—dynamic code that produces personalized content for each viewer, rather than static artifacts.

The technical architecture evolved from Microsoft's Promptflow (designed for static RAG flows) to Temporal's workflow engine, which provides durable execution as a primitive. This matters because ML pipelines fail constantly—network requests drop, GPUs aren't available, encodings break. Temporal automatically retries from the last successful step, dramatically speeding development. The author also discovered that vector databases and embeddings are oversold: they compress data into fixed-length vectors trained on generic datasets, often performing worse than traditional search enhanced with domain knowledge. Their example: ranking songs by Billboard chart history beats generic text embeddings for searches like "stop."

The piece includes honest detours—six months spent on rejected SBIR grant proposals for pedestrian safety robotics, the realization that metaprompting (LLMs generating their own prompts) surprisingly works, and the ongoing question of whether building something new alone is naive versus safe employment. The core lesson: building with AI requires mixing embeddings, traditional search, domain expertise, and accepting that the obvious approach is usually wrong.

What I've Learned in the Past Year Spent Building an AI Video Editor - Make Art with Python

TLDR

In Detail

TLDR

In Detail

Related