Home
Categories
EXPLORE
True Crime
Comedy
Society & Culture
Business
News
Sports
TV & Film
About Us
Contact Us
Copyright
© 2024 PodJoint
Podjoint Logo
US
00:00 / 00:00
Sign in

or

Don't have an account?
Sign up
Forgot password
https://is1-ssl.mzstatic.com/image/thumb/Podcasts221/v4/73/92/ac/7392ac99-4125-063c-ce42-c396c32280e3/mza_11643841201440274120.jpg/600x600bb.jpg
Smooth Scaling: System Design for High Traffic
Queue-it
15 episodes
1 day ago
Smooth Scaling: System Design for High Traffic focuses on all things scalability, reliability, and performance. Tune in for expert advice on how to scale systems, control costs, boost availability, optimize performance, and get the most out of your tech stack. Host Jose Quaresma is the VP of Technical Engagement at Queue-it, working on the frontlines with some of the world’s biggest businesses on their busiest days, from Ticketmaster to Zalando to Home Office U.K. He’ll be joined by experts across industries, uncovering how major organizations design, build, and deploy systems that remain reliable at scale.
Show more...
Technology
Business
RSS
All content for Smooth Scaling: System Design for High Traffic is the property of Queue-it and is served directly from their servers with no modification, redirects, or rehosting. The podcast is not affiliated with or endorsed by Podjoint in any way.
Smooth Scaling: System Design for High Traffic focuses on all things scalability, reliability, and performance. Tune in for expert advice on how to scale systems, control costs, boost availability, optimize performance, and get the most out of your tech stack. Host Jose Quaresma is the VP of Technical Engagement at Queue-it, working on the frontlines with some of the world’s biggest businesses on their busiest days, from Ticketmaster to Zalando to Home Office U.K. He’ll be joined by experts across industries, uncovering how major organizations design, build, and deploy systems that remain reliable at scale.
Show more...
Technology
Business
https://img.transistor.fm/JdOrXizaXLPUk-WlYTEL8Ym5xyhipRv8248IgldjVdQ/rs:fill:3000:3000:1/q:60/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS8xZTdl/OTNiOTExMDFhYjA5/Y2M3MTcyMjQ5MmZi/YWU1OS5qcGc.jpg
From Chaos to Reliability with Gremlin CEO Kolton Andrus
Smooth Scaling: System Design for High Traffic
44 minutes
4 months ago
From Chaos to Reliability with Gremlin CEO Kolton Andrus

In this episode, Kolton Andrus, Founder and CEO of Gremlin deep dives into all things chaos engineering and reliability testing. Kolton shares his journey from leading reliability efforts at Amazon and Netflix to founding Gremlin, an enterprise reliability platform. They discuss what it really takes to build resilient systems, the cultural shift required to prioritize reliability, and how Gremlin is working to reshape accountability in engineering teams. From testing dependencies to aligning incentives, this conversation is packed with real-world insights into scaling systems (and teams) that don't break under pressure.

Episode page

---

Kolton Andrus is the CEO and founder of Gremlin. Prior, he focused on building and operating reliable systems at Netflix and Amazon. At both companies he operated systems at scale, managed company wide incidents and helped build out their respective reliability programs and toolsets.

Host Jose Quaresma is the VP of Technical Engagement at Queue-it, working on the frontlines with some of the world’s biggest businesses on their busiest days, from Ticketmaster to Zalando to Home Office U.K. Each week, he’ll be joined by experts across industries, uncovering how major organizations design, build, and deploy systems that perform at scale.

This podcast is hosted by José Quaresma, researched by Joseph Thwaites and produced by Perseu Mandillo.

  • (00:00) - Intro & Guest: Kolton Andrus
  • (04:20) - Founding Gremlin (2016)
  • (08:47) - Rewarding Invisible Reliability Work
  • (12:27) - Proving Reliability’s Business Value
  • (15:21) - Rethinking the “Chaos Engineering” Label
  • (20:18) - Chaos Testing to Reliability Scores
  • (24:25) - Spreading Reliability Culture Across Teams
  • (28:50) - Safe, Incremental Failure Testing in Prod
  • (33:30) - Load + Fault Testing for Peak Traffic
  • (36:30) - AI’s Opportunities & Risks for Ops
  • (39:30) - Defining Scalability as Elasticity
  • (44:18) - Key Takeaways & Farewell

© Queue-it, 2025 
Smooth Scaling: System Design for High Traffic
Smooth Scaling: System Design for High Traffic focuses on all things scalability, reliability, and performance. Tune in for expert advice on how to scale systems, control costs, boost availability, optimize performance, and get the most out of your tech stack. Host Jose Quaresma is the VP of Technical Engagement at Queue-it, working on the frontlines with some of the world’s biggest businesses on their busiest days, from Ticketmaster to Zalando to Home Office U.K. He’ll be joined by experts across industries, uncovering how major organizations design, build, and deploy systems that remain reliable at scale.