Only 50% of companies monitor their ML systems. Building observability for AI is not simple: it goes beyond 200 OK pings. In this episode, Sylvain Kalache sits down with Conor Brondsdon (Galileo) to unpack why observability, monitoring, and human feedback are the missing links to make large language model (LLM) reliable in production. Conor dives into the shift from traditional test-driven development to evaluation-driven development, where metrics like context adherence, completeness, and ac...
All content for Humans of Reliability is the property of Rootly and is served directly from their servers
with no modification, redirects, or rehosting. The podcast is not affiliated with or endorsed by Podjoint in any way.
Only 50% of companies monitor their ML systems. Building observability for AI is not simple: it goes beyond 200 OK pings. In this episode, Sylvain Kalache sits down with Conor Brondsdon (Galileo) to unpack why observability, monitoring, and human feedback are the missing links to make large language model (LLM) reliable in production. Conor dives into the shift from traditional test-driven development to evaluation-driven development, where metrics like context adherence, completeness, and ac...
In this episode, we sit down with Sean Goedecke, Staff Software Engineer at GitHub, to discuss where LLMs fit into real-world development. Sean shares how he’s using LLMs how he’s drawing the line for AI-assistance in the codebases he manages—though, as he says, this might all change by next summer. Sean also weighs in on how LLMs could assist SREs during outages—especially when you’re only half-awake at 3 a.m. after a rather inconvinient page. Tune in for a nuanced take on ...
Humans of Reliability
Only 50% of companies monitor their ML systems. Building observability for AI is not simple: it goes beyond 200 OK pings. In this episode, Sylvain Kalache sits down with Conor Brondsdon (Galileo) to unpack why observability, monitoring, and human feedback are the missing links to make large language model (LLM) reliable in production. Conor dives into the shift from traditional test-driven development to evaluation-driven development, where metrics like context adherence, completeness, and ac...