
Arxiv: https://arxiv.org/abs/2509.14234
This episode of "The AI Research Deep Dive" unpacks "Compute as Teacher" (CaT), a paper from Meta and Anthropic that offers a way to train AI models without human-labeled answer keys. The host explains how CaT enables a model to teach itself by first generating multiple different attempts at a problem ("Exploration"). Listeners will learn about the paper's core innovation: instead of just selecting the best attempt, a "frozen anchor" version of the model synthesizes the best parts of all attempts into a new, often superior, reference answer. This self-generated answer is then used as a reward signal to improve the original model through reinforcement learning. The episode highlights the stunning results—boosting math performance by over 30%—and discusses how this paradigm of turning compute into supervision could unlock a new era of self-improving AI.