The Human Touch - How RLHF Aligns Models with Our Values

https://is1-ssl.mzstatic.com/image/thumb/Podcasts211/v4/e4/5a/3e/e45a3ea0-0a72-e431-da49-f41566400590/mza_5289533444675556324.jpg/600x600bb.jpg

All Things LLM

Mr. Dew

15 episodes

1 month ago

In the grand finale of "All Things LLM," hosts Alex and Ben look ahead to the bleeding edge—and reflect on the ultimate question for AI: can we ever truly understand how these models think? Inside this episode: The rise of reasoning models: Discover why the next leap for AI isn’t just bigger models, but smarter thinking. Explore how OpenAI’s o1 and DeepSeek-R1 represent a paradigm shift, moving from brute-force “pre-train and scale” to dynamic, inference-time reasoning. Learn how these new mo...

Technology

RSS

All content for All Things LLM is the property of Mr. Dew and is served directly from their servers with no modification, redirects, or rehosting. The podcast is not affiliated with or endorsed by Podjoint in any way.

Technology

https://storage.buzzsprout.com/pblcvxk8x0uuymdsb9psgoyyi81o?.jpg

The Human Touch - How RLHF Aligns Models with Our Values

All Things LLM

5 minutes

1 month ago

The Human Touch - How RLHF Aligns Models with Our Values

How do we make AI not just smart, but safe and genuinely helpful? In this episode of "All Things LLM," Alex and Ben break down the vital process of alignment—transforming a powerful language model into a trustworthy assistant you can rely on. Inside this episode: What is RLHF? Discover Reinforcement Learning from Human Feedback—the multi-stage process that transforms next-word predictors into helpful, instruction-following bots like ChatGPT or Claude.Step-by-Step Alignment:Supervised Fine-Tun...