DeepSeek Unveils Revolutionary Open-Source Fire-Flyer File System for AI-Optimized Parallel File Management

Introduction to DeepSeek AI’s Fire-Flyer Fire System
DeepSeek AI, a forward-thinking technology company based in China, recently transitioned its Fire-Flyer Fire System (3FS) to a fully open-source model during its Open Source Week event. This remarkable move positions 3FS as a potentially groundbreaking technology, drawing attention for its impressive capability to achieve 7.3 terabytes per second (TB/s) read throughput within its own server data clusters. Since 2019, DeepSeek has relied on 3FS to optimize the organization and efficiency of its servers.
Understanding 3FS: The Unique Parallel File System
3FS is a Linux-based parallel file system specifically designed for high-performance computing (HPC) in artificial intelligence (AI) applications. In such environments, data storage servers are frequently accessed by graphical processing units (GPUs) to train large language models (LLMs).
Key Features of 3FS
Random Read Optimization: Unlike traditional file systems, 3FS places a strong emphasis on maximizing random read speeds. It largely sets aside the concept of read caching, which is a notable divergence from standard practices.
- Purpose of Read Caching: In training AI models, compute units continuously require access to diverse sets of random training data. The data retrieval process is typically one-off, rendering read caching largely ineffective. In fact, when training LLMs, utilizing a read cache could actually hinder performance. This is due to the risk of incorrectly linking unrelated data to the model due to repetitive read orders.
Performance Metrics and Comparisons
In a detailed paper published last August, the operational team behind DeepSeek’s deep learning cluster, Fire-Flyer 2, explains the implementation of 3FS within their framework. The Fire-Flyer 2 system operates with an impressive configuration of 180 storage nodes, each equipped with 16 sixteen-terabyte (TB) solid-state drives (SSDs) and two 200 Gbps network units (NUCs). These nodes efficiently support around 10,000 Nvidia A100 GPUs, constructed within servers that are more economical than Nvidia’s proprietary DGX-A100 models.
Benchmarking Performance
DeepSeek reported a benchmark performance of 6.6 TB/s for 3FS while simultaneously running background training tasks, which contributed an additional 1.4 TB/s to the overall read throughput. To put this in perspective, the competitor Ceph recently achieved a read throughput of only 1.1 TB/s with a different configuration consisting of 68 nodes, each holding 10 sixteen-TB SSDs and employing a pair of 100 Gbps networking units.
The significance of 3FS is accentuated in the paper, which identifies it as an essential element of DeepSeek’s software architecture for training their AI models. Fire-Flyer 2 demonstrated 80% of Nvidia’s DGX-A100’s performance at just half the cost and with 60% less energy consumption.
Getting Started with 3FS
For those interested in experimenting with the Fire-Flyer File System and its unique random-read-forward methodology tailored for AI and high-performance computing applications, the complete system is available for download on DeepSeek’s GitHub page. The open-source nature of this file system suggests it may attract not only tech enthusiasts but also enterprise users engaged in AI-HPC solutions.
Despite its advantages, the future of 3FS in the market may face challenges, notably because of prevailing skepticism towards technology produced in China. Nonetheless, the potential of this open-source system to revolutionize data handling in AI applications remains a topic of interest as more users explore its capabilities.