DeepSeek-R1: Reasoning via Reinforcement Learning

https://is1-ssl.mzstatic.com/image/thumb/Podcasts211/v4/67/c8/43/67c8430f-7e3e-bcba-4469-54787206cb90/mza_12433984773424099588.jpg/600x600bb.jpg

Code Impact

Sanket Makhija

72 episodes

5 days ago

Welcome to "Code Impact," the podcast where we explore code that has an impact. Each episode dives deep into real-world stories, practical case studies, and expert insights, showcasing the powerful impact of code on performance, accessibility, and user experience. Whether you're a seasoned developer or just starting your journey, "Code Impact" delivers the tools, tips, and inspiration you need to create meaningful and high-performing products. Join us as we uncover the ways coding is transforming industries and making a difference—one line at a time. NotebookLM creates all episodes.

Education

RSS

All content for Code Impact is the property of Sanket Makhija and is served directly from their servers with no modification, redirects, or rehosting. The podcast is not affiliated with or endorsed by Podjoint in any way.

Education

https://d3t3ozftmdmh3i.cloudfront.net/staging/podcast_uploaded_nologo/42227160/42227160-1728916697741-cd37329d2cc32.jpg

DeepSeek-R1: Reasoning via Reinforcement Learning

Code Impact

19 minutes 29 seconds

9 months ago

DeepSeek-R1: Reasoning via Reinforcement Learning

This research paper introduces DeepSeek-R1, a large language model enhanced for reasoning capabilities using reinforcement learning (RL). Two versions are presented: DeepSeek-R1-Zero, trained purely via RL without supervised fine-tuning, and DeepSeek-R1, which incorporates additional multi-stage training and cold-start data for improved readability and performance. DeepSeek-R1 achieves results comparable to OpenAI's o1-1217 on various reasoning benchmarks. The study also explores distilling DeepSeek-R1's reasoning capabilities into smaller, more efficient models, achieving state-of-the-art results. Finally, the paper discusses unsuccessful attempts using process reward models and Monte Carlo Tree Search, providing valuable insights for future research.

https://github.com/deepseek-ai/DeepSeek-R1/blob/main/DeepSeek_R1.pdf