stephen wan
articles
worknotes
about
06 Jul, 2025
Understanding PPO for LLMs
13 Jun, 2025
Deriving `cross_entropy` loss
04 Jun, 2025
Reproducing deep double descent
3