← Bookmarks 📄 Article

DeepSeek V4—almost on the frontier, a fraction of the price

DeepSeek just released the largest open weights model ever (1.6T parameters) that performs within 3-6 months of GPT-5.4 and Gemini-3.1-Pro but costs 10-20x less, with efficiency breakthroughs that use only 10-27% of the compute for long context windows.

· ai ml
Read Original
Listen to Article
0:000:00
Summary used for search

• DeepSeek-V4-Pro is now the largest open weights model at 1.6T total parameters (49B active), beating Kimi K2.6 and GLM-5.1
• Pricing undercuts all frontier models: Flash is $0.14/$0.28 per million tokens, Pro is $1.74/$3.48—making Flash the cheapest frontier-class model available
• Massive efficiency gains for long context: V4-Pro uses only 27% of the FLOPs and 10% of the KV cache compared to V3.2 at 1M tokens; Flash is even more efficient at 10%/7%
• Performance trails GPT-5.4 and Gemini-3.1-Pro by approximately 3-6 months according to DeepSeek's own benchmarks
• Flash model (160GB) may run on consumer hardware like 128GB MacBook Pros with quantization

DeepSeek has released two V4 preview models that fundamentally change the economics of frontier AI. DeepSeek-V4-Pro (1.6T total parameters, 49B active) is now the largest open weights model available, more than twice the size of their previous V3.2 release. The Flash variant (284B total, 13B active) is designed for efficiency while maintaining competitive performance. Both models support 1 million token context windows and use MIT licensing.

The real breakthrough is in efficiency and pricing. DeepSeek has achieved dramatic reductions in computational requirements for long context windows—V4-Pro uses only 27% of the FLOPs and 10% of the KV cache compared to V3.2 at 1M tokens, while Flash pushes this even further to 10% FLOPs and 7% KV cache. This translates to revolutionary pricing: Flash costs $0.14/$0.28 per million tokens (input/output), making it cheaper than even GPT-5.4 Nano, while Pro costs $1.74/$3.48, undercutting all major frontier models by 10-20x. DeepSeek's own benchmarks show Pro performing competitively with GPT-5.2 and Gemini-3.0-Pro, though trailing GPT-5.4 and Gemini-3.1-Pro by approximately 3-6 months.

The implications are significant: you can now get near-frontier performance at a fraction of the cost, and because these are open weights models, you can potentially run them on your own hardware. The Flash model at 160GB may even run on high-end consumer machines with quantization, democratizing access to frontier-class AI capabilities.