Home
Categories
EXPLORE
True Crime
Comedy
Society & Culture
Business
News
Sports
TV & Film
About Us
Contact Us
Copyright
© 2024 PodJoint
Podjoint Logo
US
00:00 / 00:00
Sign in

or

Don't have an account?
Sign up
Forgot password
https://is1-ssl.mzstatic.com/image/thumb/Podcasts221/v4/21/f5/0a/21f50a00-e939-5dda-4905-c62a9fdd024f/mza_9854866287433778669.jpg/600x600bb.jpg
YAAP (Yet Another AI Podcast)
AI21
10 episodes
1 week ago
Show more...
Technology
RSS
All content for YAAP (Yet Another AI Podcast) is the property of AI21 and is served directly from their servers with no modification, redirects, or rehosting. The podcast is not affiliated with or endorsed by Podjoint in any way.
Show more...
Technology
https://pbcdn1.podbean.com/imglogo/ep-logo/pbblog20898696/160e811a8f0158d5e75824749feea76f.jpg
The Hard Truths About AI Agents: Why Benchmarks Lie and Frameworks Fail
YAAP (Yet Another AI Podcast)
39 minutes
4 months ago
The Hard Truths About AI Agents: Why Benchmarks Lie and Frameworks Fail
<p>Building AI agents that actually work is harder than the hype suggests — and most people are doing it wrong. In this special "YAAP: Unplugged" episode (a live panel from AI Tinkerers meetup at the Hugging Face offices in Paris), Yuval sits down with Aymeric Roucher (Project Lead for Agents at Hugging Face) and Niv Granot (Algorithms Group Lead at AI21 Labs) for an unfiltered discussion about the uncomfortable realities of agent development.</p><p><strong><br>Key Topics:</strong></p><ol><li><strong>Why current benchmarks are broken</strong>: From MMLU's limitations to RAG leaderboards that don't reflect real-world performance</li><li><strong>The tool use illusion</strong>: Why 95% accuracy on tool calling benchmarks doesn't mean your agent can actually plan</li><li><strong>LLM-as-a-judge problems</strong>: How evaluation bottlenecks are capping progress compared to verifiable domains like coding</li><li><strong>Framework: friend or foe?</strong> When to ditch LangChain, LlamaIndex, and why minimal implementations often work better</li><li><strong>The real agent stack</strong>: MCP, sandbox environments, and the four essential components you actually need</li><li><strong>Beyond the hype cycle</strong>: From embeddings that can't distinguish positive from negative numbers to what comes after agents</li></ol><p>From FIFA World Cup benchmarks that expose retrieval failures to the circular dependency problem with LLM judges, this conversation cuts through the marketing noise to reveal what it <em>really</em> takes to build agents that solve real problems — not just impressive demos.</p><p><em>Warning: Contains unpopular opinions about popular frameworks and uncomfortable truths about the current state of AI agent development.</em></p>
YAAP (Yet Another AI Podcast)