Beyond SLOs: How an ex-Google SRE scaled reliability at the largest e-commerce in the nordics

https://is1-ssl.mzstatic.com/image/thumb/Podcasts221/v4/20/f3/2e/20f32e67-e699-3cec-f77a-b7c1b381baa4/mza_5590371432551785944.jpg/600x600bb.jpg

Humans of Reliability

Rootly

20 episodes

3 days ago

Only 50% of companies monitor their ML systems. Building observability for AI is not simple: it goes beyond 200 OK pings. In this episode, Sylvain Kalache sits down with Conor Brondsdon (Galileo) to unpack why observability, monitoring, and human feedback are the missing links to make large language model (LLM) reliable in production. Conor dives into the shift from traditional test-driven development to evaluation-driven development, where metrics like context adherence, completeness, and ac...

Technology

RSS

All content for Humans of Reliability is the property of Rootly and is served directly from their servers with no modification, redirects, or rehosting. The podcast is not affiliated with or endorsed by Podjoint in any way.

Technology

Beyond SLOs: How an ex-Google SRE scaled reliability at the largest e-commerce in the nordics

Humans of Reliability

7 minutes

9 months ago

Beyond SLOs: How an ex-Google SRE scaled reliability at the largest e-commerce in the nordics

What happens when a Google-trained SRE joins a fast-moving e-commerce company? Gastón Rial Saibene, SRE Lead at Boozt.com, joins Humans of Reliability to talk about adapting reliability practices for different company sizes, the limits of SLOs, and the importance of automation. We also dive into decision-making, his favorite books, and—just for fun—whether he’d survive a zombie apocalypse. Tune in for insights, laughs, and a fresh perspective on the world of reliability engineerin...