Executing The Deepseek-R1 671B Model At FP16 Precision With Virtualized Workloads

Deepseek R1 671b FP16 On AMD EPYC 9965

Introduction to Deepseek-R1 671B

This year, the Deepseek-R1 announcement has generated significant interest in the tech community. It is a powerful reasoning model designed to interpret tasks and produce valuable outputs. However, running the 671 billion parameter model can be challenging due to its considerable hardware requirements. For context, the FP16 version of this model demands around 1.3 TB of memory, which surpasses the capacity of many systems like the NVIDIA H200 GPU, which has 1.128 TB of High Bandwidth Memory (HBM). Yet, it fits in AMD’s Instinct MI325X GPU setup, which boasts a total of 2TB HBM. Consequently, many users opt for a quantization method to reduce memory needs, albeit at a cost to output quality. Recently, we discovered a method to run this model and other applications on a workstation without needing a $400,000 GPU server.

For visual learners, we recorded a video demonstrating this process.

As a note, this project utilizes AMD’s Volcano platform and is sponsored by AMD. Now, let’s dive into running the Deepseek-R1 671B model.

Setting Up Deepseek-R1 671B FP16

Hardware Requirements

Our testing setup incorporates the AMD Volcano platform, featuring:

Two liquid-cooled AMD EPYC 9965 CPUs with 192 cores and 384 threads
24 x 64GB DDR5 DIMMs for a total of 1.5 TB of RAM

This configuration demonstrates a significant memory bandwidth of over 1 TB/s, making it well-suited to handle large AI models.

Operating System and Environment

To facilitate testing in both bare metal and KVM virtualized environments, we are using Ubuntu 24.04 LTS as our operating system. This setup offers numerous opportunities for performance tuning. We will share a few performance tips and common pitfalls, focusing mainly on a straightforward configuration guide.

Step-by-Step Guide for Using Docker

Installing Docker

Docker provides an efficient way to deploy applications with a user-friendly web GUI. For Ubuntu users, installation is straightforward:

Run the command: curl -fsSL https://get.docker.com -o get-docker.sh
Execute the script: sudo sh get-docker.sh
Add your user to the Docker group: sudo usermod -aG docker and then log out and back in.

Deploying Open WebUI

After installing Docker, you can use it to run models easily. We recommend deploying Open WebUI with Ollama, using the following command:

docker run -d -p 3000:8080 -v ollama:/root/.ollama -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:ollama

This will take a little time to set up. It’s beneficial to think about how you’ll access the server in the meantime. In our setup, we used Tailscale for remote access, simplifying connectivity.

Model Downloading Process

The next part involves downloading the Deepseek-R1 671B FP16 model. This specific model can be a bit tricky to access. When you visit the Ollama page for Deepseek-R1, you’ll see the default is the smaller 7B model. However, since we have ample server capacity, we’ll want to access the FP16 model instead.

Make sure to select the option for the “671b-fp16” model from the dropdown or use the link: here. This is a sizable 1.3 TB model, so ensure you have enough disk space before proceeding.

Downloading and Running the Model

In the Open WebUI interface, navigate to the admin section, then select settings, and choose models to find your local model directory. You may initially see an empty list. To download the model, click on the download icon in the upper corner of the admin panel. Enter the model name deepseek-r1:671b-fp16 and initiate the download.

Once the download completes, you can return to create a new chat and select the deepseek-r1 model to interact with it effectively.

Additional Features and Capabilities

While not covered here, Open WebUI offers options for integrating web search functions and text-to-speech capabilities for enhanced usage. Now, with the basic setup complete, you’re equipped to start exploring the possibilities of the Deepseek-R1 model.

Please follow and like us: