Shibo
Hao
Toggle navigation
about
blog
(current)
publications
cv
reinforcement-learning
an archive of posts with this tag
Oct 24, 2025
From Policy Gradients to LLM RL: TRPO, PPO, and Beyond