DeepSeek's surprisingly inexpensive AI model challenges industry norms. The company claims to have trained its powerful DeepSeek V3 neural network for a mere $6 million using 2048 GPUs, significantly undercutting competitors. However, this figure only reflects pre-training GPU costs, omitting substantial research, refinement, data processing, and infrastructure expenses.
Image: ensigame.com
DeepSeek V3's innovative architecture is key to its efficiency. It utilizes:
- Multi-token Prediction (MTP): Predicting multiple words simultaneously for improved accuracy and speed.
- Mixture of Experts (MoE): Employing 256 neural networks, activating eight for each token, accelerating training and enhancing performance.
- Multi-head Latent Attention (MLA): Repeatedly extracting key details to minimize information loss and capture crucial nuances.
Image: ensigame.com
Despite the low training cost claim, SemiAnalysis revealed DeepSeek's substantial infrastructure: approximately 50,000 Nvidia Hopper GPUs (including 10,000 H800, 10,000 H100, and additional H20 GPUs) spread across multiple data centers. This represents a total server investment of roughly $1.6 billion, with operational costs estimated at $944 million. This contrasts sharply with the publicized $6 million pre-training cost.
Image: ensigame.com
DeepSeek, a subsidiary of High-Flyer, a Chinese hedge fund, owns its data centers, providing control and faster innovation implementation. Its self-funded nature enhances agility. The company attracts top Chinese talent, with some researchers earning over $1.3 million annually. While DeepSeek's cost-effectiveness is relative, its success stems from substantial investment, technological advancements, and a highly skilled team.
Image: ensigame.com
The company's overall investment in AI development exceeds $500 million. Its streamlined structure facilitates efficient innovation compared to larger, more bureaucratic organizations. While the "revolutionary budget" narrative is arguably inflated, DeepSeek's model training costs ($5 million for R1) still significantly undercut competitors like ChatGPT4o ($100 million). Ultimately, DeepSeek demonstrates the potential of a well-funded, independent AI company to compete effectively with established giants.
