DeepSeek Transforms AI through Open Large Language Models

Introduction to DeepSeek’s Innovations
DeepSeek is a notable Chinese company that made headlines in December 2024 with the launch of two open large language models (LLMs): DeepSeek-V3 and DeepSeek-R1. These models are available for free, encouraging modifications and broader usage. Following this release, a chatbot app based on these models quickly became popular, reaching the top of Apple’s App Store. The remarkable performance of DeepSeek’s models has sparked significant reactions in the market, leading to a substantial drop in the stock prices of major AI companies, wiping off around $600 billion on January 27.
Growth of Open AI Models
The response to DeepSeek’s open AI models has been overwhelmingly positive. To date, over 700 models inspired by DeepSeek-V3 and R1 have emerged on the AI community platform HuggingFace, collectively amassing over 5 million downloads. The achievements of DeepSeek’s models, particularly given the limitations imposed by American export regulations, have garnered praise from experts in the field. For instance, Cameron R. Wolfe, a senior research scientist at Netflix, noted that these models are competitive with leading locked models.
Training Costs and Hardware Limitations
How DeepSeek-V3 Was Developed
A critical aspect of DeepSeek’s innovation lies in its approach to managing hardware restrictions set by U.S. export controls. The company claims that it spent approximately $5.6 million training the DeepSeek-V3 model using Nvidia’s H800 chips, which were manufactured under these restrictions. Despite being a less capable version of Nvidia’s hardware, DeepSeek employed a specialized algorithm called DualPipe to optimize training processes. This method utilizes low-level programming to improve task management during training.
Technical Specifications
DeepSeek-V3 features an architecture known as mixture-of-experts (MoE), which enables the activation of multiple specialized neural networks or “experts” as needed. This design reduces memory requirements during model training and cuts down on operational costs after deployment. With a total of 671 billion parameters, DeepSeek-V3 matches or surpasses performance benchmarks set by competitors like OpenAI’s GPT-4 and Anthropic’s Claude 3.5.
The DeepSeek-R1 Model
In addition to DeepSeek-V3, the company unveiled DeepSeek-R1, which can perform reasoning tasks similar to OpenAI’s models. Unlike many reasoning models that rely on traditional supervised fine-tuning (SFT), DeepSeek-R1 initially implemented reinforcement learning (RL) techniques. After encountering some challenges with this method, the team incorporated a “cold start” strategy, beginning with a limited dataset before transitioning to RL to finalize training.
Practical Applications of DeepSeek Models
Adoption in Real-World Scenarios
Rajkiran Panuganti, the senior director of generative AI applications at the Indian firm Krutrim, highlights the practical advantages of using DeepSeek’s models. Krutrim utilizes various open AI models to serve their clientele and, according to Panuganti, he would certainly recommend DeepSeek for future projects. While he acknowledges that DeepSeek’s models don’t always outperform leading closed models for the toughest tasks, he finds them to be significantly cost-effective.
Panuganti points out that DeepSeek’s models, especially DeepSeek-R1, are less expensive compared to more advanced models like OpenAI’s. Even when using their commercial API, which charges fees, the potential savings are considerable. Furthermore, DeepSeek is making it easier for users to implement its models with "distilled versions" designed to operate efficiently on less powerful devices.
Accessibility and Community Support
DeepSeek also provides a straightforward way to access its models: they can be downloaded and modified under a permissive license. This openness has allowed enthusiasts and developers to experiment with the models on everyday devices. For example, popular platforms such as Ollama support DeepSeek’s models, with users able to run DeepSeek-R1 on personal computers with ease.
The Debate Around Open Source
While DeepSeek promotes openness, some criticisms surface from the open-source community. The company does not disclose specific datasets or the complete training code used for its models, leaving important details obscured. This lack of transparency has stirred debates about what it truly means for a model to be "open." Organizations like the Open Source Initiative have previously called out similar practices by other companies in the industry.
HuggingFace is attempting to bridge this gap with an initiative called Open-R1, aiming to create a fully open-source version of DeepSeek-R1. Experts in the field emphasize that while recreating the models will be demanding, the excitement surrounding DeepSeek’s impact has captured the interest of researchers, companies, and individuals alike, showcasing a broader trend towards open possibilities in AI development.