Shibo Hao

Hello! I’m Shibo Hao, a Ph.D. student at UC San Diego, advised by Zhiting Hu. My research is funded by the Bloomberg Fellowship. I was a research scientist intern at Meta FIAR lab, mentored by Yuandong Tian and Jason Weston. I received my B.S. in Computer Science from Peking University.

My research goal is to push the boundaries of machine reasoning. My work includes training LLMs to reason with reinforcement learning (Guru, OREO, FoR), exploring reasoning in latent space (Coconut, Coconut-theory, Coconut-dynamics), building a system-2 reasoning framework using world-model planning (Reasoning via Planning, Pandora, LLM Reasoners), and augmenting LLMs with external tools (ToolkenGPT).

News

Sep 25, 2025	Guru, our exploration of cross-domain RL for LLM reasoning, and Reasoning-by-Superposition, A theoretical perspective on Coconut are accepted by NeurIPS 2025.
Apr 14, 2025	Coconut 🥥 is featured in Quanta Magazine!
Dec 21, 2024	Introducing OREO (Offline REasoning Optimization) (Arxiv, Twitter)
Dec 9, 2024	Honored to receive the Bloomberg Data Science Ph.D. Fellowship!
Jul 10, 2024	LLM Reasoners is accepted to the first Conference of Language Modeling (COLM 2024).
May 24, 2024	Check out Pandora, our new work towards a general world model 🌎
Nov 17, 2023	ToolkenGPT is accepted to NeurIPS 2023 as an oral presentation, and received the best paper award at SoCalNLP 2023🎉!
Oct 25, 2023	Reasoning via Planning (RAP) has been featured in State of AI Report 2023.

Talks

Reasoning beyond Imitating Human Language: Reinforcement Learning and Latent-space Reasoning [Slides]
- Bloomberg Ph.D. Fellowship Virtual Symposium (May 2025)
- Nvidia, Hosted by Boris Ginsburg (May 2025)
LLM Reasoners: New Evaluation, Library and Analysis of step-by-step reasoning with LLMs [Slides]
- Caltech LLM Reading Group, Hosted by Ziniu Hu (Apr 2024)
- SIGIR Virtual Forum (Mar 2024)

Selected publications

2025

NeurIPS

Reasoning by Superposition: A Theoretical Perspective on Chain of Continuous Thought

Hanlin Zhu*, Shibo Hao*, Zhiting Hu, Jiantao Jiao, Stuart Russell, and Yuandong Tian

Advances in Neural Information Processing Systems, 2025

PDF Code
ACL

Offline Reinforcement Learning for LLM Multi-Step Reasoning

Huaijie Wang*, Shibo Hao*, Hanze Dong, Shenao Zhang, Yilin Bao, Ziran Yang, and Yi Wu

In Findings of the Association for Computational Linguistics: ACL 2025, 2025

PDF Code
NeurIPS

Revisiting Reinforcement Learning for LLM Reasoning from A Cross-Domain Perspective

Zhoujun Cheng*, Shibo Hao*, Tianyang Liu*, Fan Zhou, Yutao Xie, Feng Yao, Yuexin Bian, Yonghao Zhuang, Nilabjo Dey, Yuheng Zha, and others

Advances in Neural Information Processing Systems, 2025

PDF Code
COLM

Training Large Language Models to Reason in a Continuous Latent Space

Shibo Hao, Sainbayar Sukhbaatar, DiJia Su, Xian Li, Zhiting Hu, Jason Weston, and Yuandong Tian

In Proceedings of the 2025 Conference on Language Models (COLM), 2025

Featured by Quanta Magazine

PDF Code
Preprint

Emergence of Superposition: Unveiling the Training Dynamics of Chain of Continuous Thought

Hanlin Zhu, Shibo Hao, Zhiting Hu, Jiantao Jiao, Stuart Russell, and Yuandong Tian

arXiv preprint arXiv:2509.23365, 2025

PDF Code

2024

Preprint

Pandora: Towards General World Model with Natural Language Actions and Video States

Jiannan Xiang*, Guangyi Liu*, Yi Gu*, Qiyue Gao, Yuting Ning, Yuheng Zha, Zeyu Feng, Tianhua Tao, Shibo Hao, Yemin Shi, Zhengzhong Liu, Eric P. Xing, and Zhiting Hu

arXiv preprint arXiv:2406.09455, 2024

HTML PDF Code
COLM

LLM Reasoners: New Evaluation, Library, and Analysis of Step-by-Step Reasoning with Large Language Models

Shibo Hao*, Yi Gu*, Haotian Luo*, Tianyang Liu, Xiyan Shao, Xinyuan Wang, Shuhua Xie, Haodi Ma, Adithya Samavedhi, Qiyue Gao, Zhen Wang, and Zhiting Hu

In Conference on Language Model (COLM), 2024

Also to appear at Large Language Model (LLM) Agents workshop at ICLR 2024

HTML PDF Code

2023

NeurIPS

ToolkenGPT: Augmenting Frozen Language Models with Massive Tools via Tool Embeddings

Shibo Hao, Tianyang Liu, Zhen Wang, and Zhiting Hu

Advances in Neural Information Processing Systems (oral), 2023

Best Paper Award at SoCalNLP 2023

PDF Code Poster Slides
EMNLP

Reasoning with language model is planning with world model

Shibo Hao*, Yi Gu*, Haodi Ma, Joshua Jiahua Hong, Zhen Wang, Daisy Zhe Wang, and Zhiting Hu

In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

Featured in State of AI Report 2023
Also to appear at GenPlan workshop at NeurIPS 2023

PDF Code Slides