Author: Jian Hu

First published on: 2024/11/21

The release of models like Kimi K0-Math, DeepSeek R1 Lite and Qwen QwQ has brought the replication of OpenAI’s O1 models into the spotlight, igniting fervent discussions across the AI community.

Two months ago, I launched an open-source project called Awesome-LLM-Strawberry, a curated collection of research papers, blogs, and projects focusing on OpenAI O1 model replication strategies and reasoning techniques. The repository has garnered over 5,000 stars on GitHub.

Awesome LLM Strawberry (OpenAI o1) - GitHub "A collection of LLM papers, blogs, and projects, with a focus on OpenAI O1 and reasoning techniques."

By diving deep into relevant research and collaborating with experts, I’ve compiled and hypothesized several potential strategies for replicating O1 models. This post outlines these findings for further exploration.


DeepSeek R1 Lite & Kimi K0-Math & Qwen QwQ

The recent releases of DeepSeek R1 Lite, Kimi K0-Math and Qwen QwQ provide valuable insights into potential approaches for O1 model replication.

Evaluation results for DeepSeek-R-Lite and Kimi K0-math

Evaluation results for DeepSeek-R-Lite and Kimi K0-math

Evaluation results for Qwen QwQ

Evaluation results for Qwen QwQ


Training Phase

Stage 0: Continued Pretraining

Stage 1: Supervised Fine-Tuning (SFT)

Stage 2: Reinforcement Learning for Advanced Reasoning