Customizing DeepSeek Models with Amazon SageMaker

Organizations across various sectors are increasingly leveraging generative AI foundation models (FMs) to improve their applications. To tailor these models to specific needs, customers are adapting them to fit their unique requirements. This customization process has intensified with the release of new models like the DeepSeek variants.

Challenges in Customizing DeepSeek Models

While the potential of DeepSeek models stands out, customizing them effectively involves a number of hurdles. Adjusting model architecture necessitates not only technical skills but also training, fine-tuning parameters, and effectively managing distributed training setups. This complexity often leads organizations to grapple with the balance between achieving optimal model performance and addressing practical implementation constraints, underlining the urgent demand for simplified customization solutions.

Streamlining Customization: A Two-Part Series

In the first of this two-part series, we explore how the complexity of customizing DeepSeek models can be alleviated through pre-built fine-tuning workflows or “recipes,” which are integrated into Amazon SageMaker HyperPod. This initial installment will detail the solution architecture for fine-tuning the DeepSeek-R1 distilled models and will include a practical step-by-step guide for customizing the DeepSeek-R1 Distill Qwen 7b model, with the aim of achieving notable improvements across various metrics. The follow-up post will shift focus to fine-tuning the DeepSeek-R1 671b model itself.

Introduction to Amazon SageMaker HyperPod Recipes

At the recent re:Invent 2024 event, Amazon announced the full availability of SageMaker HyperPod recipes. These recipes assist data scientists and developers, regardless of skill level, by expediting the training process for widely used generative AI models. Designed to optimize training performance, they include a validated training stack from Amazon Web Services (AWS), sparing users from the tedious task of testing various model configurations. By automating several key functions—like loading training data, applying distributed training methods, creating automatic checkpoints for quicker recovery, and overseeing the overall training process—these recipes greatly enhance efficiency.

The integration of these recipes with AWS’s robust infrastructure offers a reliable environment for fine-tuning models such as DeepSeek-R1. The recently released recipes enable users to fine-tune six different DeepSeek models, including options for supervised fine-tuning, low-rank adaptation techniques, and quantized approaches.

Solution Architecture Overview

The architecture implemented in these recipes relies on a hierarchical workflow. It starts with a recipe specification that comprehensively outlines parameters for training, model structure, and distributed training strategies. The process is inked through a HyperPod recipe launcher responsible for managing job execution on the relevant architecture. This launcher interacts with underlying systems like SageMaker HyperPod or other training jobs, effectively coordinating resource distribution and scheduling.

Using SageMaker HyperPod

To implement jobs via SageMaker HyperPod, users can utilize the HyperPod recipes launcher. This approach allows tasks to be executed on both Slurm and Kubernetes, offering flexibility based on cluster preferences. By selecting an appropriate orchestrator, participants can initiate recipe-specific jobs tailored for their cluster setups. This means they can optimize their training processes in ways most suitable to their operating environment.

Preparing for Training

Before launching into fine-tuning, there are critical prerequisites to settle:

SageMaker Quota Requests: Organizations must request minimum quotas for compute instances, notably the p4d.24xlarge, which includes a set number of NVIDIA A100 GPUs to facilitate the processing needed for training.
Cluster Setup: Utilize the AWS CloudFormation template or follow documentation to establish a HyperPod Slurm cluster, or alternatively, set up an Amazon SageMaker Studio domain for Jupyter notebook access.
Clone Necessary Repositories: Download the required files from the GitHub repository housing the assets for this deployment.

Preparing the Dataset

When preparing the dataset, follow these steps:

Format and tokenize the FreedomIntelligence/medical-o1-reasoning-SFT dataset to align with the expected input for the DeepSeek-R1 Distill Qwen 7B model.
Load this dataset and segment it into training and validation sets.
Generate tokens and store the prepared dataset into appropriate formats for use in SageMaker.

Fine-Tuning Options

Option A: Using SageMaker Training Jobs

With the ModelTrainer function, users can define their training setup by selecting an instance type and specifying storage for model checkpoints. Setting up input channels is equally important for ensuring data flows correctly into the training job.

Option B: Using SageMaker HyperPod with Slurm

To initiate training with HyperPod, ensure the cluster is operational and follow the necessary setup to enable the environment. Commands outlined in the documentation can aid in creating squash files for running jobs, coupled with updating configuration files with job parameters.

Evaluating Model Performance

Once fine-tuning is complete, evaluating the model’s performance using ROUGE metrics offers a standardized method for assessing quality. This allows for a structured comparison between the original and fine-tuned models, thereby providing an objective measure of improvement.

In summary, the integration of the SageMaker HyperPod recipes with the DeepSeek models offers an innovative route to simplify the process of fine-tuning and customizing advanced AI solutions. By breaking down complex workflows and harnessing AWS’s powerful infrastructure, organizations can expedite their development and implementation strategies in generative AI.

Please follow and like us: