Defeating Nondeterminism in LLM Inference

https://is1-ssl.mzstatic.com/image/thumb/Podcasts211/v4/6a/24/22/6a242243-a886-3562-51aa-5b0137909c8b/mza_6305134645633578970.jpg/600x600bb.jpg

The AI Research Deep Dive

36 episodes

6 days ago

From arXiv to insight: a daily tour of cutting-edge AI papers. The AI Research Deep Dive podcast dives into a new groundbreaking research paper every day. It combs through the most important details and results to give you a great idea of what the paper accomplishes and how it gets there.

Science

RSS

All content for The AI Research Deep Dive is the property of The AI Research Deep Dive and is served directly from their servers with no modification, redirects, or rehosting. The podcast is not affiliated with or endorsed by Podjoint in any way.

Science

https://d3t3ozftmdmh3i.cloudfront.net/staging/podcast_uploaded_nologo/43949260/43949260-1750798569136-3391783a0fb9a.jpg

Defeating Nondeterminism in LLM Inference

The AI Research Deep Dive

15 minutes 26 seconds

1 month ago

Defeating Nondeterminism in LLM Inference

Link: https://thinkingmachines.ai/blog/defeating-nondeterminism-in-llm-inference/

This episode of "The AI Research Deep Dive" explores a blog post from Thinking Machines Lab that solves a frustrating mystery: why large language models give different answers to the same prompt even with deterministic settings. The host explains how the authors debunked the common theory of random floating-point errors, instead identifying the true culprit as a lack of "batch invariance" in modern inference libraries. Listeners will learn how the way a user's request is batched with others randomly changes the underlying GPU calculations, leading to different results. The episode covers the team's solution—custom-engineered GPU kernels that enforce consistency—and discusses the profound implications for achieving perfect reproducibility and enabling more stable, "truly on-policy" reinforcement learning.