SambaNova Achieves 198 Tokens Per Second with Full DeepSeek-R1 671B Using Just 16 SN40L RDU Chips

SambaNova and DeepSeek: A New Era in AI Performance
Key Highlights
- Speed and Efficiency: SambaNova reports the DeepSeek-R1 operates at an impressive rate of 198 tokens per second with just 16 specialized chips.
- Advanced Hardware: The SN40L RDU chip is touted to be three times faster and five times more efficient than conventional GPUs.
- Future Predictions: SambaNova anticipates that speeds will improve by five times soon and plans to achieve a remarkable 100 times capacity by the end of the year on their cloud platform.
The Rise of DeepSeek
In 2025, Chinese AI company DeepSeek has quickly established itself in the artificial intelligence landscape. Their R1 language model, designed for complex reasoning tasks, delivers top-tier performance comparable to the best models in the industry while being economically viable.
SambaNova Systems, an AI startup founded in 2017 by a team of experts from renowned institutions like Sun/Oracle and Stanford University, claims to have achieved the fastest deployment of the DeepSeek-R1 model, which has 671 billion parameters.
Groundbreaking Deployment
According to SambaNova, it has achieved a remarkable output of 198 tokens per second per user using only 16 custom-built chips. This setup replaces the extensive infrastructure traditionally needed, which often includes 40 racks of 320 Nvidia GPUs.
Independent Verification of Performance
Rodrigo Liang, co-founder and CEO of SambaNova, emphasized the efficiency of their system. He stated, "Powered by the SN40L RDU chip, SambaNova is the fastest platform running DeepSeek. Our performance will increase to five times faster than the latest GPU speeds on a single rack, and we will roll out a hundred times more capacity for DeepSeek-R1 by the end of the year."
SambaNova argues that their unique reconfigurable dataflow architecture is more efficient than the traditional reliance on Nvidia GPUs. They claim their hardware delivers three times the speed and five times the efficiency, all while maintaining the sophisticated reasoning capabilities of DeepSeek-R1.
Importance of High Output Speeds
George Cameron, co-founder of Artificial Analysis, an AI evaluation firm, corroborated SambaNova’s claims, highlighting the significance of high output speeds. His team independently benchmarked SambaNova’s full deployment of the DeepSeek-R1 model at over 195 output tokens per second. Cameron noted that such high-speed performance is especially beneficial for reasoning models, as they generate output tokens that enhance the overall quality of responses. "SambaNova’s high output speeds will facilitate the use of reasoning models in applications that require minimal latency," he explained.
Availability and Future Plans
The DeepSeek-R1 671B model is now accessible on the SambaNova Cloud, with API access being offered to select early users. The company is rapidly scaling its capabilities and is optimistic about reaching a total throughput of 20,000 tokens per second in the near future.
As advancements in AI technology continue, the implications for various applications—from natural language processing to serving high-demand computational tasks—are profound. SambaNova’s innovative approach not only highlights the potential of customized hardware but also marks a significant shift in how complex AI models can be efficiently deployed and utilized in real-world scenarios.