Exploring OpenAI O1 Model Replication

Author: Jian Hu

First published on: 2024/11/21

The release of models like Kimi K0-Math, DeepSeek R1 Lite and Qwen QwQ has brought the replication of OpenAI’s O1 models into the spotlight, igniting fervent discussions across the AI community.

Two months ago, I launched an open-source project called Awesome-LLM-Strawberry, a curated collection of research papers, blogs, and projects focusing on OpenAI O1 model replication strategies and reasoning techniques. The repository has garnered over 5,000 stars on GitHub.

Awesome LLM Strawberry (OpenAI o1) - GitHub "A collection of LLM papers, blogs, and projects, with a focus on OpenAI O1 and reasoning techniques."

By diving deep into relevant research and collaborating with experts, I’ve compiled and hypothesized several potential strategies for replicating O1 models. This post outlines these findings for further exploration.

DeepSeek R1 Lite & Kimi K0-Math & Qwen QwQ

The recent releases of DeepSeek R1 Lite, Kimi K0-Math and Qwen QwQ provide valuable insights into potential approaches for O1 model replication.

Evaluation results for DeepSeek-R-Lite and Kimi K0-math

Evaluation results for Qwen QwQ

Training Phase

Stage 0: Continued Pretraining

Objective: Enhance the base model’s reasoning capabilities using large-scale datasets such as CoT (Chain-of-Thought), code, and mathematics.

Stage 1: Supervised Fine-Tuning (SFT)

Objective: Train the model to generate ultra-long CoT reasoning chains and reflective instruction formats, laying the groundwork for subsequent reinforcement learning training.

<aside> 💡

For some open-source models, such as o1-journey-part2 and this research, distilling from OpenAI o1 / QwQ has achieved excellent results.

</aside>

DeepSeek R1 Lite & Kimi K0-Math & Qwen QwQ

Training Phase

Stage 0: Continued Pretraining

Stage 1: Supervised Fine-Tuning (SFT)

Stage 2: Reinforcement Learning for Advanced Reasoning