DeepSeek On The ZUBoard

Introduction to Large Language Models (LLMs)

In recent years, Large Language Models (LLMs) have gained significant attention in the field of artificial intelligence (AI). These advanced systems are designed to understand and generate human-like text responses based on various text inputs. The term "large" refers to the extensive amount of data utilized in their training, which encompasses everything from web pages and books to emails and chat conversations. This immense data pool is why organizations, particularly in the social media sector, find data acquisition incredibly valuable.

The impressive capabilities of LLMs include answering queries, assisting in writing essays, correcting grammar, and even generating code. It’s important to note that LLMs operate through deep learning techniques rather than possessing real intelligence. They rely on complex mathematical operations, primarily involving linear algebra and optimization. When given a prompt, these models generate the most likely text output based on their defined parameters.

Typically, LLMs consist of billions or even tens of billions of parameters resulting from their extensive training. Because of this complexity, high-performance computing resources, like AMD Instinct™ GPUs, are often required for effective performance. However, there is growing interest in deploying LLMs on edge devices locally, which leads us to explore one such implementation involving the DeepSeek model on the Tria Technologies ZUBoard.

Use Cases for Edge Deployment

Running LLMs on edge devices can present various advantages. Here are some primary use cases to consider:

Privacy: With the model hosted directly on the device, sensitive information is less likely to be exposed.
Offline Access: Users can operate the model without needing a stable internet connection.
Cost Savings: Avoiding cloud data hosting fees can significantly reduce overall operational costs.

Although deploying AI at the edge offers many benefits, it often entails compromises regarding privacy, performance, and financial expenditure.

Understanding DeepSeek

DeepSeek is an innovative suite of high-performance open-weight LLMs and large multimodal models (LMMs) designed to advance open AI research. These models are trained from scratch using vast amounts of data across various formats, including web content, academic publications, codebases, and carefully selected datasets. DeepSeek’s architecture is grounded in transformer designs, incorporating enhancements like Grouped Query Attention (GQA) and SwiGLU activation functions for improved efficiency.

DeepSeek’s models are available in multiple sizes, ranging from 1.5 billion to 13 billion parameters, allowing for flexible trade-offs between resource consumption and accuracy. Furthermore, the DeepSeek-VL (Vision-Language) models add another dimension by merging image datasets with text guidance, enabling tasks such as visual question answering and image-text generation.

To accommodate edge and mobile usages, quantized versions (like 4-bit and 8-bit) of the models are available, ensuring that deployment is feasible across various environments.

Installing and Setting Up on the ZUBoard

To initiate our project, we will employ the AMD PYNQ™ platform as the foundation for installing DeepSeek.

Step-by-Step Installation Process:

Download the PYNQ image for the Avnet ZUBoard from the PYNQ website and transfer it to an SD card.
Configure boot settings to start from the SD card, then insert the card into the ZUBoard. Establish connections with a USB UART cable and an Ethernet cable.
Access PYNQ via a browser at the address: http://pynq:9090. The password you need is "xilinx."

Now that PYNQ is up and running, open a terminal to proceed with the installation of DeepSeek.

Preparing the Environment

Since the ZUBoard has limited memory (1GB of LPDDR4), it’s essential to configure a swap file on the SD card to prevent crashes due to low memory. A swap file serves as additional virtual memory space when RAM alone is insufficient. Here’s how to set it up for at least 4GB of swap space using terminal commands:

swapon --show
sudo swapoff /var/swap
sudo rm /var/swap
sudo chmod 600 /var/swap
sudo mkswap /var/swap
sudo swapon /var/swap

After running these commands, the swapon --show command should confirm the allocation of 4GB of swap space.

Installing and Testing DeepSeek with Ollama

To install the DeepSeek model, we will utilize the Ollama framework, which allows for easy local deployment of LLMs. Start with the installation by executing the following command:

curl -fsSL https://ollama.com/install.sh | sh

You can disregard any GPU-related warnings, as the ZUBoard does not have one. Once installed, you can prepare to download the DeepSeek model:

Create a directory for the model:

mkdir deepseek
export HOME=/home/xilinx/deepseek

Download the DeepSeek model with the command:
```
ollama pull deepseek-r1:1.5b
```

Running and Interacting with DeepSeek

With the installation complete, you can now start the DeepSeek model on your ZUBoard using the command:

ollama run deepseek-r1:1.5b

The interface will prompt for input, allowing you to interact with the model similarly to other LLMs. You can check system performance using the top command in another terminal to monitor resource usage during operation.

Please follow and like us: