Home
Categories
EXPLORE
True Crime
Comedy
Business
Society & Culture
Sports
Technology
History
About Us
Contact Us
Copyright
© 2024 PodJoint
Podjoint Logo
US
00:00 / 00:00
Sign in

or

Don't have an account?
Sign up
Forgot password
https://is1-ssl.mzstatic.com/image/thumb/Podcasts211/v4/f2/56/51/f256516c-7ca0-a1e0-095d-98b42a505a34/mza_2950839120930297173.jpg/600x600bb.jpg
Best AI papers explained
Enoch H. Kang
524 episodes
1 day ago
Cut through the noise. We curate and break down the most important AI papers so you don’t have to.
Show more...
Technology
RSS
All content for Best AI papers explained is the property of Enoch H. Kang and is served directly from their servers with no modification, redirects, or rehosting. The podcast is not affiliated with or endorsed by Podjoint in any way.
Cut through the noise. We curate and break down the most important AI papers so you don’t have to.
Show more...
Technology
https://d3t3ozftmdmh3i.cloudfront.net/staging/podcast_uploaded_nologo/43252366/43252366-1744500070152-e62b760188d8.jpg
Reasoning with Sampling: Base Models Outperform RL
Best AI papers explained
16 minutes 3 seconds
1 week ago
Reasoning with Sampling: Base Models Outperform RL

The research paper titled "Reasoning with Sampling: Your Base Model is Smarter Than You Think" by Harvard researchers introduces a novel, training-free iterative sampling algorithm inspired by Markov Chain Monte Carlo (MCMC) techniques to enhance the reasoning capabilities of large language models (LLMs) at inference time. This method, termed "Power Sampling," leverages the base model's own likelihoods to simulate sampling from a "power distribution," which sharpens the distribution toward higher-likelihood sequences without additional training or the need for a reward signal. The authors argue that this technique successfully elicits latent reasoning skills in base models, demonstrating performance on par with, and sometimes exceeding, models post-trained with Reinforcement Learning (RL), particularly the Group Relative Policy Optimization (GRPO) method, across diverse benchmarks like MATH500, HumanEval, and GPQA. Crucially, Power Sampling maintains greater generation diversity compared to RL-posttraining, which typically suffers from a collapse in multi-shot performance.

Best AI papers explained
Cut through the noise. We curate and break down the most important AI papers so you don’t have to.