Home
Categories
EXPLORE
True Crime
Comedy
Society & Culture
Business
Sports
History
Music
About Us
Contact Us
Copyright
© 2024 PodJoint
00:00 / 00:00
Sign in

or

Don't have an account?
Sign up
Forgot password
https://is1-ssl.mzstatic.com/image/thumb/Podcasts221/v4/20/f3/2e/20f32e67-e699-3cec-f77a-b7c1b381baa4/mza_5590371432551785944.jpg/600x600bb.jpg
Humans of Reliability
Rootly
20 episodes
4 days ago
Only 50% of companies monitor their ML systems. Building observability for AI is not simple: it goes beyond 200 OK pings. In this episode, Sylvain Kalache sits down with Conor Brondsdon (Galileo) to unpack why observability, monitoring, and human feedback are the missing links to make large language model (LLM) reliable in production. Conor dives into the shift from traditional test-driven development to evaluation-driven development, where metrics like context adherence, completeness, and ac...
Show more...
Technology
RSS
All content for Humans of Reliability is the property of Rootly and is served directly from their servers with no modification, redirects, or rehosting. The podcast is not affiliated with or endorsed by Podjoint in any way.
Only 50% of companies monitor their ML systems. Building observability for AI is not simple: it goes beyond 200 OK pings. In this episode, Sylvain Kalache sits down with Conor Brondsdon (Galileo) to unpack why observability, monitoring, and human feedback are the missing links to make large language model (LLM) reliable in production. Conor dives into the shift from traditional test-driven development to evaluation-driven development, where metrics like context adherence, completeness, and ac...
Show more...
Technology
https://is1-ssl.mzstatic.com/image/thumb/Podcasts221/v4/20/f3/2e/20f32e67-e699-3cec-f77a-b7c1b381baa4/mza_5590371432551785944.jpg/600x600bb.jpg
Scientific Incident Management with Dan Slimmon
Humans of Reliability
37 minutes
8 months ago
Scientific Incident Management with Dan Slimmon
Dan Slimmon is an incident management veteran who's worked at Etsy, HashiCorp, and now leads consulting and training on pragmatic, non-bureaucratic incident response. In this episode, Dan shares his philosophy on "scientific incident response," the importance of hypothesis-driven troubleshooting, and why incidents should be seen as normal in complex systems. We also explore: Why asking the right questions is more important than knowing all the answers. How to use nerd sniping...
Humans of Reliability
Only 50% of companies monitor their ML systems. Building observability for AI is not simple: it goes beyond 200 OK pings. In this episode, Sylvain Kalache sits down with Conor Brondsdon (Galileo) to unpack why observability, monitoring, and human feedback are the missing links to make large language model (LLM) reliable in production. Conor dives into the shift from traditional test-driven development to evaluation-driven development, where metrics like context adherence, completeness, and ac...