In this episode, Sudarshan shares his experience leading high-performing SRE and infrastructure teams at Rippling, Twilio, Walmart, and Epsilon. He talks about reducing CI/CD costs by 60 percent, cutting on-call alerts by 65 percent, and the mindset required to build resilient systems.
In this episode, Madhu Rawat (CTO, Xurrent) sits down with Sakshi — Co-founder and Head of Engineering at Kapstan, with leadership experience at Sumo Logic and UpGrad. They discuss the evolution of observability, building for scale, the role of AI in incident management, and what it means to lead engineering teams through change.
In this episode, Phil (CPO) and Madhu (CTO) from Xurrent sit down with Vishwa and Ankur from Zenduty to talk about ITxM, building for reliability across teams, and how product and platform thinking come together in real-world incident workflows.
In this episode, we speak with Deepak Rajanna, CPTO at SatSure and ex-Amazon, Flipkart, xto10x, about pricing failures at scale, war room lessons from Big Billion Days, and building satellite-powered systems with SRE principles at their core.
In this episode of Incidentally Reliable, we sit down with Amit Rhinde, Head of Engineering at GoDaddy, to uncover the secrets behind building resilient systems, scaling global operations, and ensuring uptime for millions of users.
Amit takes us through his incredible journey, from pioneering SRE practices at Adobe and AWS to leading one of the world's most trusted hosting platforms.
In this episode of Incidentally Reliable, we chat with Denys Pashutynski, Senior Engineering Manager of Site Reliability at Roblox, about the challenges of maintaining gaming reliability for millions. Denys, with experience at companies like Twitter, AWS, and eBay, dives into how Roblox handles latency, traffic spikes, and customer expectations.
We dive into the trenches with Abhishek Ghosh, a veteran who has led SRE teams at Pinterest, and now at Cribl. He shares gripping war room stories from Pinterest, strategies for maintaining uptime, insights into the role of AI in observability, and more! Discover the future of SRE and learn how to navigate the challenges of digital reliability. Tune in to gain valuable lessons from one of the industry's leading experts.
Catch Ramiro Berrelleza — Founder and CEO at Okteto talk about how impactful DevTool startups are built, the importance of investing in Developer Experience, and the emerging issues in the Cloud Native ecosystem.
Catch Krishnendu Majumdar (CPTO at Yubi) talk about his journey in the dynamic Indian startup ecosystem, strategies to build for scale from Day 1 and insights into building sustained user trust via exceptional product performance in high governance industries like credit and finance.
Catch Niall Murphy (Co-Founder of Stanza Systems) talk about graceful degradation, what startups are getting wrong about reliability and how well-thought user-experiences can communicate credibility to current and potential customers.
Very few people in the last 50 years have changed the way software is built. Solomon continues to contribute to this very mission — building products that makes the lives of software developers, operators and maintainers easier everywhere.
Tune in as Solomon shares stories from the early days of Docker, the rollercoaster journey leading to 20 million active developers worldwide, the heavy crown of a tech leader and his vision to revolutionize CI/CD with Dagger today.
Catch Ashutosh Sharma (Director of Engineering at Myntra) talk to Vishwa Krishnakumar as we explore his journey so far, and learn about the culture, the people and the processes that make our favourite fashion destination reliable.
Exclusively on The Incidentally Reliable podcast — made by SREs for SREs, hosted by Zenduty.
Catch Piyush Verma, Co-Founder and CTO at Last9 in conversation with Ankur Rawal, Co-Founder and CTO at Zenduty — discussing what reliability means to the modern consumer, why SREs make excellent decision-makers, and the current state of observability.
Exclusively on The Incidentally Reliable podcast — made by SREs for SREs, hosted by Zenduty.
Settle in and listen to Suresh Kumar Khemka(Head of Platform & Infra at apna) talk about platform engineering, balancing bureaucracy and velocity at Startups and Tech Giants, and the rippling impact of an e-commerce's downtime.
Exclusively on The Incidentally Reliable podcast — made by SREs for SREs, hosted by Zenduty.
Catch Viraj Patel(prev. VP Engg. at BookMyShow, Flipkart) deliver a masterclass in category creation, product innovation, and engineering culture — straight from the front seat of one of the world's biggest entertainment and ticketing companies.
Catch Manoj Sebastian(ex-Flipkart, Amazon, Atlassian, Intuit, Yahoo) talk about The Evolution of SRE through 20 years, Unique Outages and Post Incident Culture at Big Tech and the Future of Reliability with AI ramping up at full speed.
Reliability and DevOps at growing stages, tiffs between Platform Engineering and DevOps, metrics to watch and a lot more, with Manan Verma - Associate Director of Engineering at PhysicsWallah.
The line between DevOps and SRE, building a DevOps Startup, war-room atmosphere at different scales, how to inculcate a culture of reliability into your teams and more.
Sit back, grab some coffee, and get ready for some jaw-dropping and spirited conversations with Rajesh Tilwani, Co-Founder of Humalect.