DeepSeek-R1: Financial Considerations for On-Premise Deployments

Examining Cybersecurity and Cost Challenges of AI Models
Understanding AI Models and Cybersecurity Risks
In today’s tech landscape, IT leaders are increasingly concerned about the cybersecurity risks that come with allowing users direct access to large language models (LLMs) like ChatGPT through the cloud. While one approach has been to use cloud-based LLMs, another viable option is to rely on open-source LLMs. These can be hosted on-premise or used via a private cloud, which provides greater control over data security.
The Hardware Requirements for AI Models
Large AI models require extensive hardware capabilities. For instance, these models usually run entirely in-memory, meaning they need to keep all data readily available for quick access. If organizations choose to use graphics processing units (GPUs) for AI acceleration, they must consider the significant investment in purchasing powerful GPUs to support the vast memory needed for these models.
Noteworthy is Nvidia’s high-caliber AI acceleration GPU, the H100, which is equipped with 80 gigabytes of RAM and has a power consumption of 350 watts. However, deploying models like DeepSeek—the R1 LLM developed by China’s DeepSeek—entails more than just a simple GPU investment.
Memory and GPU Costs
DeepSeek’s R1 model consists of a staggering 671 billion parameters. To run this model entirely in-memory, organizations need 768 gigabytes of RAM. This would typically require 10 Nvidia H100 GPUs, each providing 80 gigabytes of memory, leading to a staggering total hardware cost of around $250,000, which might be negotiable with volume discounts.
While it’s possible to reduce costs by using less powerful GPUs, the overall expenditure still exceeds $100,000 for a capable server. Alternatively, cloud infrastructure could offer some cost relief. For example, Azure provides access to Nvidia H100 GPUs, charging about $27.17 per hour. On an annual basis, should the model be in use every working day, costs could approach $46,000.
Using less powerful GPU options, like Google Cloud’s Nvidia T4, could reduce costs further, amounting to around $13,000 annually with a three-year commitment. This shows that while GPUs can be expensive, there are variations that might offer savings.
A Cost-Effective Alternative: CPUs
IT leaders looking to cut costs further can consider using general-purpose central processing units (CPUs) instead of expensive GPUs. This alternative is especially suitable if the DeepSeek-R1 model is employed only for AI inference tasks.
A recent suggestion from Matthew Carrigan, a machine learning engineer at Hugging Face, outlines how a system reliant on two AMD Epyc server processors and 768 gigabytes of memory can be assembled for approximately $6,000. Carrigan reported achieving processing speeds of six to eight tokens per second, depending on various factors including memory speed and query length.
However, it’s essential to recognize that while using CPUs may reduce costs significantly, they often lack the speed advantages offered by GPUs, although they allow for considerable savings.
Innovative Solutions for Memory Costs
There are also innovative approaches to managing memory costs when deploying powerful AI models. One such method has been adopted by SambaNova, which uses a custom chip with multi-tier memory architecture. This technology allows for running complex AI models like DeepSeek-R1 more efficiently, reducing the required hardware from 40 racks down to just one.
Companies like SambaNova are in the process of that turning traditional reliance on extensive GPU setups upside down, showcasing that there are equally effective alternatives.
SambaNova’s efforts are highlighted by their collaboration with Saudi Telecom, demonstrating how governments can explore varied strategies for building sovereign AI capabilities without solely depending on expensive GPU configurations.
Such developments suggest that organizations can achieve high performance while managing costs effectively and ensuring security is a central concern in their deployment strategies.