Home
Categories
EXPLORE
True Crime
Comedy
Society & Culture
Business
Sports
Technology
History
About Us
Contact Us
Copyright
© 2024 PodJoint
Podjoint Logo
US
00:00 / 00:00
Sign in

or

Don't have an account?
Sign up
Forgot password
https://is1-ssl.mzstatic.com/image/thumb/Podcasts211/v4/f2/56/51/f256516c-7ca0-a1e0-095d-98b42a505a34/mza_2950839120930297173.jpg/600x600bb.jpg
Best AI papers explained
Enoch H. Kang
524 episodes
1 day ago
Cut through the noise. We curate and break down the most important AI papers so you don’t have to.
Show more...
Technology
RSS
All content for Best AI papers explained is the property of Enoch H. Kang and is served directly from their servers with no modification, redirects, or rehosting. The podcast is not affiliated with or endorsed by Podjoint in any way.
Cut through the noise. We curate and break down the most important AI papers so you don’t have to.
Show more...
Technology
https://d3t3ozftmdmh3i.cloudfront.net/staging/podcast_uploaded_nologo/43252366/43252366-1744500070152-e62b760188d8.jpg
The Coverage Principle: How Pre-Training Enables Post-Training
Best AI papers explained
16 minutes 11 seconds
1 week ago
The Coverage Principle: How Pre-Training Enables Post-Training

This paper provides a theoretical analysis of next-token prediction in language models, introducing the concept of the coverage profile ($\text{Cov}_N$) as a superior metric to cross-entropy for predicting downstream performance with Best-of-N (BoN) sampling. The authors establish a "coverage principle," demonstrating that maximum likelihood, or next-token prediction, implicitly optimizes the coverage profile, leading to faster generalization that avoids the spurious dependence on sequence length seen in cross-entropy/KL divergence. The research shows that achieving a good coverage profile is necessary and sufficient for BoN success and derives scaling laws relating cross-entropy to coverage, while also exploring various optimization methods like stochastic gradient descent (SGD) and gradient normalization to provably improve coverage bounds. Finally, the text proposes tournament-style estimators for selecting models with optimal coverage, particularly in scenarios where the true data distribution is unknown.

Best AI papers explained
Cut through the noise. We curate and break down the most important AI papers so you don’t have to.