On-Policy Distillation: Efficient Post-Training for Language Models

https://is1-ssl.mzstatic.com/image/thumb/Podcasts211/v4/3a/6a/25/3a6a2521-e9c8-50fb-f24c-72997aa0376e/mza_16441109677767728869.jpg/600x600bb.jpg

Intelligence Unbound

Fourth Mind

44 episodes

3 days ago

Unpacking the questions shaping the next intelligence era. I am producing a fully AI-generated podcast that explores the influence of AI within various industries and examines significant technological breakthroughs.

Technology

RSS

All content for Intelligence Unbound is the property of Fourth Mind and is served directly from their servers with no modification, redirects, or rehosting. The podcast is not affiliated with or endorsed by Podjoint in any way.

Technology

https://d3t3ozftmdmh3i.cloudfront.net/staging/podcast_uploaded_nologo/43986554/43986554-1751387264162-08d1b0cbbe5db.jpg

On-Policy Distillation: Efficient Post-Training for Language Models

Intelligence Unbound

17 minutes 40 seconds

4 days ago

On-Policy Distillation: Efficient Post-Training for Language Models

This episode introduces and evaluates On-Policy Distillation (OPD) as a highly efficient method for the post-training of large language models (LLMs). The authors categorize LLM training into three phases—pre-training, mid-training, and post-training—and distinguish between on-policy training (sampling from the student model) and off-policy training (imitating external sources).