Why Every AI PM Needs to Run Evals | Aman Khan, Arize AI Head of Product (Ex. Spotify, Apple, Cruise)

https://is1-ssl.mzstatic.com/image/thumb/Podcasts221/v4/6f/58/a0/6f58a053-d666-e47c-85c5-580e008edd90/mza_5866865859639103052.png/600x600bb.jpg

Future Proof: Building AI Products that Last

Paragon

8 episodes

2 days ago

Future Proof brings top Product and Engineering leaders in B2B SaaS, working on AI products, to share learnings, discuss challenges and the latest developments in the future of B2B software. Producer: Forrest Herlick Host: Ethan Lee

Technology

RSS

All content for Future Proof: Building AI Products that Last is the property of Paragon and is served directly from their servers with no modification, redirects, or rehosting. The podcast is not affiliated with or endorsed by Podjoint in any way.

Technology

https://d3t3ozftmdmh3i.cloudfront.net/staging/podcast_uploaded_episode/43825047/43825047-1754318110847-73a69639d9d3b.jpg

Why Every AI PM Needs to Run Evals | Aman Khan, Arize AI Head of Product (Ex. Spotify, Apple, Cruise)

Future Proof: Building AI Products that Last

39 minutes 43 seconds

3 months ago

Why Every AI PM Needs to Run Evals | Aman Khan, Arize AI Head of Product (Ex. Spotify, Apple, Cruise)

In this episode of Future Proof, we sit down with Aman Khan, the Head of Product at Arize AI. Aman reveals why traditional product metrics fail for AI systems and shares Arize's framework for building evaluation systems that actually predict real-world AI performance, plus the emerging PM skills that separate successful AI products from failed experiments.

We discuss:

How AI builders should think about evaluations
The role of the AI pm and how product management is evolving
How you should build with the expectations of foundation models changing.

(0:00) Highlights

(0:37) Intro

(1:40) What is an AI pm

(4:10) How PMs are evolving with AI

(8:10) The Aha moment in AI

(11:50) What AI builders should think about evaluations

(19:40) How AI builders best leverage their time in AI evaluations

(23:40) Prompt iteration - if your evaluations are not ideal, how do you iterate?

(27:40) What’s the minimum viable eval someone should write

(30:40) How would prioritization change based on the future of AI models

(36:40) Final thoughts

(38:00) Ethan's reflection

Ship integrations 7x faster https://www.useparagon.com/

Watch all Future Proof episodes: https://www.useparagon.com/future-proof