Mastering Reasoning LLMs: Decoding AI's Complex Problem-Solving Strategies

https://is1-ssl.mzstatic.com/image/thumb/Podcasts211/v4/b6/f4/f1/b6f4f1c3-1b3a-446a-0825-a0df93438450/mza_12161177387315978638.jpg/600x600bb.jpg

Smart Enterprises: AI Frontiers

Ali Mehedi

89 episodes

11 hours ago

Welcome to Smart Enterprises: AI Frontiers, where we explore the cutting-edge of AI technology and its impact on enterprise and business transformation. Join us as we dive into the latest innovations, strategies, and success stories, helping businesses harness the power of AI to stay competitive in an ever-evolving market. Whether you're an industry leader or just getting started with AI, this podcast is your go-to resource for actionable insights and expert analysis.

Tech News

News

RSS

All content for Smart Enterprises: AI Frontiers is the property of Ali Mehedi and is served directly from their servers with no modification, redirects, or rehosting. The podcast is not affiliated with or endorsed by Podjoint in any way.

Tech News

News

https://d3t3ozftmdmh3i.cloudfront.net/staging/podcast_uploaded_nologo/42295933/42295933-1729711732301-3d9f09647417d.jpg

Mastering Reasoning LLMs: Decoding AI's Complex Problem-Solving Strategies

Smart Enterprises: AI Frontiers

33 minutes 43 seconds

3 months ago

Mastering Reasoning LLMs: Decoding AI's Complex Problem-Solving Strategies

Join us for an insightful exploration into the world of Reasoning LLMs, drawing on the expertise of Sebastian Raschka, PhD. This episode demystifies how Large Language Models (LLMs) are being refined to excel at complex tasks that require intermediate steps, such as solving puzzles, advanced mathematics, and challenging coding problems, moving beyond simple factual question-answering.

We'll uncover the four main approaches currently used to build and improve these specialised reasoning capabilities:

Inference-time scaling: Discover how techniques like Chain-of-Thought (CoT) prompting encourage LLMs to generate intermediate reasoning steps, mimicking a 'thought process' and often leading to more accurate results on more complex problems. This approach increases computational resources during inference, making it more expensive.
Pure Reinforcement Learning (RL): Learn about the surprising emergence of reasoning behaviour from pure reinforcement learning, as demonstrated by DeepSeek-R1-Zero. This model was trained exclusively with RL, without an initial supervised fine-tuning (SFT) stage, using accuracy and format rewards to develop basic reasoning skills.
Supervised Fine-tuning (SFT) + Reinforcement Learning (RL): Understand this key approach for building high-performance reasoning models, exemplified by DeepSeek's flagship R1 model. This method refines models with additional SFT stages and further RL training, building upon "cold-started" pure RL models.
Pure SFT and Distillation: Explore how smaller, more efficient reasoning models can be created by instruction fine-tuning them on high-quality SFT data generated by larger, stronger LLMs. This approach is particularly attractive for creating models that are cheaper to run and can operate on lower-end hardware.

We'll also discuss when to use reasoning models – they are ideal for complex challenges but can be inefficient, more verbose, and expensive for simpler tasks, sometimes even being "prone to errors due to 'overthinking'". The episode provides valuable insights from the DeepSeek R1 pipeline as a detailed case study and touches upon comparisons with models like OpenAI's o1. Plus, get tips for developing reasoning models on a limited budget, including the promise of distillation and innovative methods like 'journey learning', which includes incorrect solution paths to teach models from mistakes. Tune in to navigate the rapidly evolving landscape of reasoning LLMs!