Home
Categories
EXPLORE
True Crime
Comedy
Business
Society & Culture
History
Sports
Health & Fitness
About Us
Contact Us
Copyright
© 2024 PodJoint
00:00 / 00:00
Sign in

or

Don't have an account?
Sign up
Forgot password
https://is1-ssl.mzstatic.com/image/thumb/Podcasts221/v4/c8/4e/1e/c84e1e35-8326-9e8b-7244-8994663d69d1/mza_10845128310563015798.jpg/600x600bb.jpg
Fresh From the Labs
Pioneer Square Labs
23 episodes
3 days ago
Fresh From the Labs is your front-row seat to the future of AI — straight from the builders shaping it. Hosted by the product team at Pioneer Square Labs, a Seattle-based venture studio, each episode dives into the week's most exciting AI breakthroughs, tools, and trends. No hype, just hands-on insight from the people actually prototyping, experimenting, and pushing boundaries with the latest tech. Whether you're building with AI or just trying to keep up, this podcast is your lab-tested shortcut to what matters most.
Show more...
Technology
RSS
All content for Fresh From the Labs is the property of Pioneer Square Labs and is served directly from their servers with no modification, redirects, or rehosting. The podcast is not affiliated with or endorsed by Podjoint in any way.
Fresh From the Labs is your front-row seat to the future of AI — straight from the builders shaping it. Hosted by the product team at Pioneer Square Labs, a Seattle-based venture studio, each episode dives into the week's most exciting AI breakthroughs, tools, and trends. No hype, just hands-on insight from the people actually prototyping, experimenting, and pushing boundaries with the latest tech. Whether you're building with AI or just trying to keep up, this podcast is your lab-tested shortcut to what matters most.
Show more...
Technology
https://d3t3ozftmdmh3i.cloudfront.net/staging/podcast_uploaded_nologo/43357237/43357237-1743204497565-17888a5b23d1f.jpg
Beyond the Benchmarks: o3 Reality Check, AI Companies, and The Leaderboard Problem
Fresh From the Labs
43 minutes 59 seconds
6 months ago
Beyond the Benchmarks: o3 Reality Check, AI Companies, and The Leaderboard Problem

This week on Fresh From the Labs, we're looking past the leaderboards and hype to explore the real-world challenges and limitations of today's AI.

Can AI actually run a company? We dive into recent CMU research that put AI agents to the test, revealing significant struggles with common sense tasks and complex automation like using a web browser effectively.

The conversation unpacks the performance of specific models like o3, contrasting benchmark achievements with practical usability and the ever-present issue of AI hallucinations. We discuss the dangers these hallucinations pose, especially in critical applications, how they can subtly mislead users, create more work, and why simply topping a leaderboard (thanks, Goodhart's Law!) doesn't guarantee success for your specific problem.

Join Shilpa, Jared, and Kevin as they discuss the trial-and-error reality of model selection, the importance of truly understanding the problem you're solving, and why promising developments like local models might offer a path forward through some of these current hurdles. It's a candid look at where AI excels and where it still falls short.

Link to Dr. Anthony Diamond's blog post on o1: https://www.psl.com/feed-posts/o1-an-entirely-different-animal---buyer-beware

Fresh From the Labs
Fresh From the Labs is your front-row seat to the future of AI — straight from the builders shaping it. Hosted by the product team at Pioneer Square Labs, a Seattle-based venture studio, each episode dives into the week's most exciting AI breakthroughs, tools, and trends. No hype, just hands-on insight from the people actually prototyping, experimenting, and pushing boundaries with the latest tech. Whether you're building with AI or just trying to keep up, this podcast is your lab-tested shortcut to what matters most.