Hello SundAI - our world through the lense of AI

https://is1-ssl.mzstatic.com/image/thumb/Podcasts221/v4/ce/ef/59/ceef597b-a627-2ee4-f95b-3c0e813acd75/mza_18377625926653678381.jpg/600x600bb.jpg

Roger Basler de Roca

52 episodes

1 week ago

"Hello SundAI - Our World Through the Lens of AI," is your twice-weekly dive into how artificial intelligence shapes our digital landscape. Hosted by Roger and SundAI the AI, this podcast brings you practical tips, cutting-edge tools, and insightful interviews every Sunday and Wednesday morning. Whether you're a seasoned tech enthusiast or just starting to explore the digital domain, tune in to discover innovative ways to get things done and propel yourself forward in a world increasingly driven by AI. Our hashtag is: #helloSundai

Business

RSS

All content for Hello SundAI - our world through the lense of AI is the property of Roger Basler de Roca and is served directly from their servers with no modification, redirects, or rehosting. The podcast is not affiliated with or endorsed by Podjoint in any way.

Business

Episodes (20/52)

Hello SundAI - our world through the lense of AI

The Illusion of Thinking: Decoding AI's Reasoning Limits

In this episode, we enter the world of Large Reasoning Models (LRMs).

We explore advanced AI systems such as OpenAI’s o1/o3, DeepSeek-R1, and Claude 3.7 Sonnet Thinking—models that generate detailed "thinking processes" (Chain-of-Thought, CoT) with built-in self-reflection before answering.

These systems promise a new era of problem-solving. Yet, their true capabilities, scaling behavior, and limitations remain only partially understood.

By conducting systematic investigations in controlled puzzle environments—including the Tower of Hanoi, Checker Jumping, River Crossing, and Blocks World—we uncover both the strengths and surprising weaknesses of LRMs.

These environments allow precise control over task complexity while avoiding data contamination issues that often plague established benchmarks in mathematics and coding.

A striking finding: LRMs face a complete accuracy collapse beyond certain complexity thresholds. Paradoxically, their reasoning effort (measured in "thinking tokens") first increases with complexity, only to decline after a point—even when token budgets are sufficient.

We identify three distinct performance regimes:

Low-complexity tasks – where standard Large Language Models (LLMs) still outperform LRMs.
Medium-complexity tasks – where LRMs’ additional "thinking" shows a clear advantage.
High-complexity tasks – where both LLMs and LRMs collapse entirely.

Another challenge is “overthinking.” On simpler problems, LRMs often find correct solutions early but continue to pursue false alternatives, wasting computational resources. Even more surprising is their weakness in exact computation: they fail to leverage explicit algorithms, even when provided, and show inconsistent reasoning across different puzzle types.

This episode invites you to rethink assumptions about AI’s capacity for generalizable reasoning. What does it truly mean for a machine to "think" under increasing complexity? And how should these insights shape the next generation of AI design and deployment?

Sources: Shojaee, P., Mirzadeh, I., Alizadeh, K., Horton, M., Bengio, S., & Farajtabar, M. (2025). The illusion of thinking: Understanding the strengths and limitations of reasoning models via the lens of problem complexity. (Unpublished manuscript). https://arxiv.org/abs/2506.06941

Disclaimer: This podcast is generated by Roger Basler de Roca (contact) by the use of AI. The voices are artificially generated and the discussion is based on public research data. I do not claim any ownership of the presented material as it is for education purpose only.