Only 50% of companies monitor their ML systems. Building observability for AI is not simple: it goes beyond 200 OK pings. In this episode, Sylvain Kalache sits down with Conor Brondsdon (Galileo) to unpack why observability, monitoring, and human feedback are the missing links to make large language model (LLM) reliable in production. Conor dives into the shift from traditional test-driven development to evaluation-driven development, where metrics like context adherence, completeness, and ac...
All content for Humans of Reliability is the property of Rootly and is served directly from their servers
with no modification, redirects, or rehosting. The podcast is not affiliated with or endorsed by Podjoint in any way.
Only 50% of companies monitor their ML systems. Building observability for AI is not simple: it goes beyond 200 OK pings. In this episode, Sylvain Kalache sits down with Conor Brondsdon (Galileo) to unpack why observability, monitoring, and human feedback are the missing links to make large language model (LLM) reliable in production. Conor dives into the shift from traditional test-driven development to evaluation-driven development, where metrics like context adherence, completeness, and ac...
The Golden Path to Nowhere: When Platforms Undermine Reliability with Chase Roberts (Northflank)
Humans of Reliability
27 minutes
5 months ago
The Golden Path to Nowhere: When Platforms Undermine Reliability with Chase Roberts (Northflank)
Internal platforms promise speed, consistency, and scale — but what happens when they become a distraction? In this episode, Chase Roberts, COO at Northflank, joins Sylvain Kalache to examine the quiet ways platforms erode developer experience when not planned carefully. From abandoned golden paths to shadow deployments and brittle YAML pipelines, Chase walks us through: Why early PaaS got developer experience right and what it missed The cultural bias toward building over bu...
Humans of Reliability
Only 50% of companies monitor their ML systems. Building observability for AI is not simple: it goes beyond 200 OK pings. In this episode, Sylvain Kalache sits down with Conor Brondsdon (Galileo) to unpack why observability, monitoring, and human feedback are the missing links to make large language model (LLM) reliable in production. Conor dives into the shift from traditional test-driven development to evaluation-driven development, where metrics like context adherence, completeness, and ac...