Home
Categories
EXPLORE
True Crime
Comedy
Society & Culture
Business
News
Sports
TV & Film
About Us
Contact Us
Copyright
© 2024 PodJoint
Podjoint Logo
US
00:00 / 00:00
Sign in

or

Don't have an account?
Sign up
Forgot password
https://is1-ssl.mzstatic.com/image/thumb/Podcasts211/v4/59/1b/d6/591bd635-cba2-9f2e-a016-8368093b1a2e/mza_8226736797578667226.jpg/600x600bb.jpg
Overcommitted
overcommitted.dev
32 episodes
19 hours ago
A handful of overcommitted software engineers talking about our commits and our commitments.
Show more...
Technology
RSS
All content for Overcommitted is the property of overcommitted.dev and is served directly from their servers with no modification, redirects, or rehosting. The podcast is not affiliated with or endorsed by Podjoint in any way.
A handful of overcommitted software engineers talking about our commits and our commitments.
Show more...
Technology
https://d3t3ozftmdmh3i.cloudfront.net/staging/podcast_uploaded_nologo/43243169/43243169-1746559462989-16d13287f57d8.jpg
Ep. 16 | Understanding Software Availability with Ross Brodbeck
Overcommitted
44 minutes 19 seconds
3 months ago
Ep. 16 | Understanding Software Availability with Ross Brodbeck

Summary

In this episode of the Overcommitted Podcast, Brittany Ellich and her co-hosts engage with Ross Brodbeck, a software engineer at GitHub, to explore the critical topic of software availability. They discuss the definitions of availability, reliability, and uptime, and delve into frameworks for improving availability in software systems. The conversation covers proactive versus reactive approaches to availability, the business impact of availability, and the hidden costs associated with downtime. Ross shares insights on creating effective availability programs, the role of incident commanders, and emerging technologies that may shape the future of availability in software engineering. The episode concludes with book recommendations for software engineers looking to deepen their understanding of the field.

Takeaways

  • Availability is subjective and varies by organization.

  • Observability is crucial for understanding production behavior.

  • Proactive measures can help prevent availability issues.

  • On-call burnout is a significant cost to organizations.

  • Understanding business needs is key to defining availability.

  • SLOs help in measuring and reporting availability effectively.

  • Incident commanders play a vital role in managing incidents.

  • Game days and playbooks are essential for preparedness.

  • Hidden costs of downtime include loss of customer trust.

  • Emerging technologies like AI may change availability management.

Links

  • Ross’s Blog

  • Google SRE Book

  • https://sreweekly.com/

  • https://uptime.is/

  • Catchpoint SRE Report

  • Software engineer’s guidebook

  • Designing data-intensive applications

  • Thinking in systems

  • The best software writing one - Joel on Software 

  • Algorithms to live by

  • The Staff Engineer

  • Clean Code

  • Pragmatic Engineer Podcast - Thomas Dhomke interview

  • Distributed systems by Martin van Steen

  • Practical object-oriented design in Ruby

  • Looks Good To Me

  • Tech book club Repo⁠

  • ⁠Overcommitted Discord⁠

Hosts

  • ⁠Overcommitted.dev⁠⁠⁠

  • ⁠Bethany Janos⁠

  • ⁠Brittany Ellich⁠

  • ⁠Eggyhead⁠

  • ⁠Jonathan Tamsut


Overcommitted
A handful of overcommitted software engineers talking about our commits and our commitments.