Link to the article: https://dev.to/developertharun/8-ways-to-become-a-better-sre-right-now-8-non-technical-characteristics-to-have-3n4p
Link to the YouTube video: https://youtu.be/2drsyhJzcao
Subscribe the podcast if you like it!
Thanks for listening.
Thank you for listening to my Podcast. Follow my podcast if you find it helpful.
Check out my other episodes. I talk about programming & software engineering.
YouTube: https://youtube.com/c/developerTharun
Blog Article on: https://tharunshiv.com
Instagram: @developerTharun
Dev.to: https://dev.to/developertharun
Udemy: https://www.udemy.com/user/tharun-shiv/
LinkedIn: https://linkedin.com/in/tharunshiv
Link to the article: https://dev.to/developertharun/8-ways-to-become-a-better-sre-right-now-8-non-technical-characteristics-to-have-3n4p
Link to the YouTube video: https://youtu.be/2drsyhJzcao
Subscribe the podcast if you like it!
Thanks for listening.
Thank you for listening to my Podcast. Follow my podcast if you find it helpful.
Check out my other episodes. I talk about programming & software engineering.
YouTube: https://youtube.com/c/developerTharun
Blog Article on: https://tharunshiv.com
Instagram: @developerTharun
Dev.to: https://dev.to/developertharun
Udemy: https://www.udemy.com/user/tharun-shiv/
LinkedIn: https://linkedin.com/in/tharunshiv
Link to the article: https://dev.to/developertharun/8-ways-to-become-a-better-sre-right-now-8-non-technical-characteristics-to-have-3n4p
Link to the YouTube video: https://youtu.be/2drsyhJzcao
Subscribe the podcast if you like it!
Thanks for listening.
Thank you for listening to my Podcast. Follow my podcast if you find it helpful.
Check out my other episodes. I talk about programming & software engineering.
YouTube: https://youtube.com/c/developerTharun
Blog Article on: https://tharunshiv.com
Instagram: @developerTharun
Dev.to: https://dev.to/developertharun
Udemy: https://www.udemy.com/user/tharun-shiv/
LinkedIn: https://linkedin.com/in/tharunshiv
Link to the article: https://dev.to/developertharun/8-ways-to-become-a-better-sre-right-now-8-non-technical-characteristics-to-have-3n4p
Link to the YouTube video: https://youtu.be/2drsyhJzcao
Subscribe the podcast if you like it!
Thanks for listening.
Link to the article: https://dev.to/developertharun/8-ways-to-become-a-better-sre-right-now-8-non-technical-characteristics-to-have-3n4p
Link to the YouTube video: https://youtu.be/2drsyhJzcao
Subscribe the podcast if you like it!
Thanks for listening.
a. No blame game
b. Thirst to solve
As an SRE we deal with multiple components and are a bridge between the users and the application. Even though the application is well written, a bigger responsibility falls upon SRE to keep the applications and the services it uses up and running. In this process, there might be a few situations where one of the SRE does a mistake that causes a disruption or even an outage. When this happens, the first thing to happen shouldn't be to blame anyone for the outage, but the following has to be performed.
i. Fix the issue
ii. Write an RCA ( Root Cause Analysis ) that mentions why the issue occurred in the first place, the names can be anonymous.
iii. Mention the first aid and the fix for the issue
iv. Discuss how the issue can be prevented the next time
v. Set an ETA for the fix
Another aspect is to have the right mindset to solve problems. As an SRE you have the responsibility to optimize the infrastructure, fix issues, build automation tools, monitoring tools, and more, which requires a lot of problem-solving skills. Unless you have the thirst to solve the problems, you will only feel more stressed out, or even worse, would cause issues.
a. Overcommunication is not a problem
b. Be kind and show empathy
Are you performing a production activity or even a stage change that could affect other teams? Have you made progress in the project that you are working on? Make sure to keep the necessary stakeholders in sync always. Write emails, send slack messages well in advance before the production activity, just before and after the activity. It might sound like over-communication, but trust me, as the company scales, you need to keep everyone relevant to the component that you are working on in sync. This way, if they have to take any actions from their side, they will do it, or if they face any issues post-activity they'll know who the right person to get in touch with is.
One other important characteristic to have as a human being is to be kind and show empathy. This will apply to all levels of engineering on either side of the conversation, period. Whether someone asks a silly question, or does a mistake, or behaves rudely with you, you should never mirror that behavior.
a. Do not miss team meetings
b. Prevent duplication of work
c. Do not compete, but contribute
In this work from home ( WFH ) period, the only time where you have an opportunity to speak to your teammates is during a team meet. The reason why this is special is, you get an opportunity to stay synced with your team on what they all are working on, whether they are blocked on any tasks, how you can contribute to their tasks and also you will be using this opportunity to convey on what you are working on and get help if necessary. This also prevents duplication of work.
The best way to learn is by doing it hands-on and the best way to begin would be by watching how it is done. I also believe that the best way to retain the learned information is by performing it repeatedly. This also includes watching your teammates perform the activities. It ensures that the activity is done without any mistakes when there are several eyes to watch it.
Do not expect all details to be taught by your teammates and seniors. Read the documentation, watch tutorials, read engineering blogs, practice on your own, and suggest improvisations. Even a well-built system will have much more efficient solutions, that you can propose
a. No blame game
b. Thirst to solve
As an SRE we deal with multiple components and are a bridge between the users and the application. Even though the application is well written, a bigger responsibility falls upon SRE to keep the applications and the services it uses up and running. In this process, there might be a few situations where one of the SRE does a mistake that causes a disruption or even an outage. When this happens, the first thing to happen shouldn't be to blame anyone for the outage, but the following has to be performed.
i. Fix the issue
ii. Write an RCA ( Root Cause Analysis ) that mentions why the issue occurred in the first place, the names can be anonymous.
iii. Mention the first aid and the fix for the issue
iv. Discuss how the issue can be prevented the next time
v. Set an ETA for the fix
Another aspect is to have the right mindset to solve problems. As an SRE you have the responsibility to optimize the infrastructure, fix issues, build automation tools, monitoring tools, and more, which requires a lot of problem-solving skills. Unless you have the thirst to solve the problems, you will only feel more stressed out, or even worse, would cause issues.
a. Overcommunication is not a problem
b. Be kind and show empathy
Are you performing a production activity or even a stage change that could affect other teams? Have you made progress in the project that you are working on? Make sure to keep the necessary stakeholders in sync always. Write emails, send slack messages well in advance before the production activity, just before and after the activity. It might sound like over-communication, but trust me, as the company scales, you need to keep everyone relevant to the component that you are working on in sync. This way, if they have to take any actions from their side, they will do it, or if they face any issues post-activity they'll know who the right person to get in touch with is.
One other important characteristic to have as a human being is to be kind and show empathy. This will apply to all levels of engineering on either side of the conversation, period. Whether someone asks a silly question, or does a mistake, or behaves rudely with you, you should never mirror that behavior.
a. Do not miss team meetings
b. Prevent duplication of work
c. Do not compete, but contribute
In this work from home ( WFH ) period, the only time where you have an opportunity to speak to your teammates is during a team meet. The reason why this is special is, you get an opportunity to stay synced with your team on what they all are working on, whether they are blocked on any tasks, how you can contribute to their tasks and also you will be using this opportunity to convey on what you are working on and get help if necessary. This also prevents duplication of work.
The best way to learn is by doing it hands-on and the best way to begin would be by watching how it is done. I also believe that the best way to retain the learned information is by performing it repeatedly. This also includes watching your teammates perform the activities. It ensures that the activity is done without any mistakes when there are several eyes to watch it.
Do not expect all details to be taught by your teammates and seniors. Read the documentation, watch tutorials, read engineering blogs, practice on your own, and suggest improvisations. Even a well-built system will have much more efficient solutions, that you can propose
Link to the complete episode: https://anchor.fm/dashboard/episode/e1cjm7b
Hey there! Follow the podcast if you like the episode
This is Tharun. In the Developer Tharun Podcast, I speak about Software Engineering
Thank you for Listening
And more...
Site Reliability Engineering, also popularly referred to as the SRE, is a role in Computer Science Engineering where the main purpose is to provision, maintain, monitor, and manage the infrastructure in order to provide maximum application uptime and reliability. SRE is an emerging role, but the tasks that the SRE does were always there ever since the first application that was developed. The scope of the software developers ends where they write code to develop the application and right from setting up the infrastructure, the various services that run on them, the network connectivity that is required, providing a platform for the application to run and making sure every part of the application is up and running reliably 24x7 is the duty of an SRE. In fact, we can consider Site Reliability Engineers are the strong bridge between the users and a reliable application.
Now, in order to explain the different responsibilities of an SRE, I have divided it into 4 different categories. I have always seen SRE this way, and definitely not as some ad-hoc process. The four categories in which I would classify the tasks of a Site Reliability Engineer are:
Let's dive deep into each one of them.
SREs are responsible for provisioning the virtual machines with the requested resources in terms of CPU, memory, disks, network configurations, and operating system. They are also responsible to be rack aware during provisioning. Example operating systems involve Linux Ubuntu, CentOS, Windows.
Example technologies involve NGINX, Apache, RabbitMQ, Kafka, Hadoop, Traefik, MySQL, PostgreSQL, Aerospike, MongoDB, Redis, MinIO, Kubernetes, Apache Mesos, Marathon, MariaDB, Galera.
Since there are several components and services that are being used in the infrastructure, there is a scope for improvements in terms of performance, efficiency, and security. The SRE optimizes the components by keeping them up to date, choosing the right service for the right job, patching the servers.
When the SRE are involved in maintaining an infrastructure of any size, they never underestimate any component of the infrastructure and write a monitoring script to monitor the components and metrics of each and every one of them. This provides the ability to get real-time alerts on any of the components malfunctioning and also a better view of the infrastructure. The SRE uses programming languages like Bash, Python, Golang, Perl, and tools like daemon processes, Riemann, InfluxDB, OpenTSDB, Kafka, Grafana, Prometheus, and APIs to monitor the infrastructure
If there are more than 10 steps to be performed and chances are that the task has to be performed more than once, the SRE never hesitate to automate the task. This saves time and also prevents human error. The SRE uses programming languages like Bash, Python, Golang, Perl, Ansible to automate the tasks.
Site Reliability Engineering, also popularly referred to as the SRE, is a role in Computer Science Engineering where the main purpose is to provision, maintain, monitor, and manage the infrastructure in order to provide maximum application uptime and reliability. SRE is an emerging role, but the tasks that the SRE does were always there ever since the first application that was developed. The scope of the software developers ends where they write code to develop the application and right from setting up the infrastructure, the various services that run on them, the network connectivity that is required, providing a platform for the application to run and making sure every part of the application is up and running reliably 24x7 is the duty of an SRE. In fact, we can consider Site Reliability Engineers are the strong bridge between the users and a reliable application.
Now, in order to explain the different responsibilities of an SRE, I have divided it into 4 different categories. I have always seen SRE this way, and definitely not as some ad-hoc process. The four categories in which I would classify the tasks of a Site Reliability Engineer are:
Let's dive deep into each one of them.
SREs are responsible for provisioning the virtual machines with the requested resources in terms of CPU, memory, disks, network configurations, and operating system. They are also responsible to be rack aware during provisioning. Example operating systems involve Linux Ubuntu, CentOS, Windows.
Example technologies involve NGINX, Apache, RabbitMQ, Kafka, Hadoop, Traefik, MySQL, PostgreSQL, Aerospike, MongoDB, Redis, MinIO, Kubernetes, Apache Mesos, Marathon, MariaDB, Galera.
Since there are several components and services that are being used in the infrastructure, there is a scope for improvements in terms of performance, efficiency, and security. The SRE optimizes the components by keeping them up to date, choosing the right service for the right job, patching the servers.
When the SRE are involved in maintaining an infrastructure of any size, they never underestimate any component of the infrastructure and write a monitoring script to monitor the components and metrics of each and every one of them. This provides the ability to get real-time alerts on any of the components malfunctioning and also a better view of the infrastructure. The SRE uses programming languages like Bash, Python, Golang, Perl, and tools like daemon processes, Riemann, InfluxDB, OpenTSDB, Kafka, Grafana, Prometheus, and APIs to monitor the infrastructure
If there are more than 10 steps to be performed and chances are that the task has to be performed more than once, the SRE never hesitate to automate the task. This saves time and also prevents human error. The SRE uses programming languages like Bash, Python, Golang, Perl, Ansible to automate the tasks.
Site Reliability Engineering, also popularly referred to as the SRE, is a role in Computer Science Engineering where the main purpose is to provision, maintain, monitor, and manage the infrastructure in order to provide maximum application uptime and reliability. SRE is an emerging role, but the tasks that the SRE does were always there ever since the first application that was developed. The scope of the software developers ends where they write code to develop the application and right from setting up the infrastructure, the various services that run on them, the network connectivity that is required, providing a platform for the application to run and making sure every part of the application is up and running reliably 24x7 is the duty of an SRE. In fact, we can consider Site Reliability Engineers are the strong bridge between the users and a reliable application.
Now, in order to explain the different responsibilities of an SRE, I have divided it into 4 different categories. I have always seen SRE this way, and definitely not as some ad-hoc process. The four categories in which I would classify the tasks of a Site Reliability Engineer are:
Let's dive deep into each one of them.
SREs are responsible for provisioning the virtual machines with the requested resources in terms of CPU, memory, disks, network configurations, and operating system. They are also responsible to be rack aware during provisioning. Example operating systems involve Linux Ubuntu, CentOS, Windows.
Example technologies involve NGINX, Apache, RabbitMQ, Kafka, Hadoop, Traefik, MySQL, PostgreSQL, Aerospike, MongoDB, Redis, MinIO, Kubernetes, Apache Mesos, Marathon, MariaDB, Galera.
Since there are several components and services that are being used in the infrastructure, there is a scope for improvements in terms of performance, efficiency, and security. The SRE optimizes the components by keeping them up to date, choosing the right service for the right job, patching the servers.
When the SRE are involved in maintaining an infrastructure of any size, they never underestimate any component of the infrastructure and write a monitoring script to monitor the components and metrics of each and every one of them. This provides the ability to get real-time alerts on any of the components malfunctioning and also a better view of the infrastructure. The SRE uses programming languages like Bash, Python, Golang, Perl, and tools like daemon processes, Riemann, InfluxDB, OpenTSDB, Kafka, Grafana, Prometheus, and APIs to monitor the infrastructure
If there are more than 10 steps to be performed and chances are that the task has to be performed more than once, the SRE never hesitate to automate the task. This saves time and also prevents human error. The SRE uses programming languages like Bash, Python, Golang, Perl, Ansible to automate the tasks.
Site Reliability Engineering, also popularly referred to as the SRE, is a role in Computer Science Engineering where the main purpose is to provision, maintain, monitor, and manage the infrastructure in order to provide maximum application uptime and reliability. SRE is an emerging role, but the tasks that the SRE does were always there ever since the first application that was developed. The scope of the software developers ends where they write code to develop the application and right from setting up the infrastructure, the various services that run on them, the network connectivity that is required, providing a platform for the application to run and making sure every part of the application is up and running reliably 24x7 is the duty of an SRE. In fact, we can consider Site Reliability Engineers are the strong bridge between the users and a reliable application.
Now, in order to explain the different responsibilities of an SRE, I have divided it into 4 different categories. I have always seen SRE this way, and definitely not as some ad-hoc process. The four categories in which I would classify the tasks of a Site Reliability Engineer are:
Let's dive deep into each one of them.
SREs are responsible for provisioning the virtual machines with the requested resources in terms of CPU, memory, disks, network configurations, and operating system. They are also responsible to be rack aware during provisioning. Example operating systems involve Linux Ubuntu, CentOS, Windows.
Example technologies involve NGINX, Apache, RabbitMQ, Kafka, Hadoop, Traefik, MySQL, PostgreSQL, Aerospike, MongoDB, Redis, MinIO, Kubernetes, Apache Mesos, Marathon, MariaDB, Galera.
Since there are several components and services that are being used in the infrastructure, there is a scope for improvements in terms of performance, efficiency, and security. The SRE optimizes the components by keeping them up to date, choosing the right service for the right job, patching the servers.
When the SRE are involved in maintaining an infrastructure of any size, they never underestimate any component of the infrastructure and write a monitoring script to monitor the components and metrics of each and every one of them. This provides the ability to get real-time alerts on any of the components malfunctioning and also a better view of the infrastructure. The SRE uses programming languages like Bash, Python, Golang, Perl, and tools like daemon processes, Riemann, InfluxDB, OpenTSDB, Kafka, Grafana, Prometheus, and APIs to monitor the infrastructure
If there are more than 10 steps to be performed and chances are that the task has to be performed more than once, the SRE never hesitate to automate the task. This saves time and also prevents human error. The SRE uses programming languages like Bash, Python, Golang, Perl, Ansible to automate the tasks.
Site Reliability Engineering, also popularly referred to as the SRE, is a role in Computer Science Engineering where the main purpose is to provision, maintain, monitor, and manage the infrastructure in order to provide maximum application uptime and reliability. SRE is an emerging role, but the tasks that the SRE does were always there ever since the first application that was developed. The scope of the software developers ends where they write code to develop the application and right from setting up the infrastructure, the various services that run on them, the network connectivity that is required, providing a platform for the application to run and making sure every part of the application is up and running reliably 24x7 is the duty of an SRE. In fact, we can consider Site Reliability Engineers are the strong bridge between the users and a reliable application.
Now, in order to explain the different responsibilities of an SRE, I have divided it into 4 different categories. I have always seen SRE this way, and definitely not as some ad-hoc process. The four categories in which I would classify the tasks of a Site Reliability Engineer are:
Let's dive deep into each one of them.
SREs are responsible for provisioning the virtual machines with the requested resources in terms of CPU, memory, disks, network configurations, and operating system. They are also responsible to be rack aware during provisioning. Example operating systems involve Linux Ubuntu, CentOS, Windows.
Example technologies involve NGINX, Apache, RabbitMQ, Kafka, Hadoop, Traefik, MySQL, PostgreSQL, Aerospike, MongoDB, Redis, MinIO, Kubernetes, Apache Mesos, Marathon, MariaDB, Galera.
Since there are several components and services that are being used in the infrastructure, there is a scope for improvements in terms of performance, efficiency, and security. The SRE optimizes the components by keeping them up to date, choosing the right service for the right job, patching the servers.
When the SRE are involved in maintaining an infrastructure of any size, they never underestimate any component of the infrastructure and write a monitoring script to monitor the components and metrics of each and every one of them. This provides the ability to get real-time alerts on any of the components malfunctioning and also a better view of the infrastructure. The SRE uses programming languages like Bash, Python, Golang, Perl, and tools like daemon processes, Riemann, InfluxDB, OpenTSDB, Kafka, Grafana, Prometheus, and APIs to monitor the infrastructure
If there are more than 10 steps to be performed and chances are that the task has to be performed more than once, the SRE never hesitate to automate the task. This saves time and also prevents human error. The SRE uses programming languages like Bash, Python, Golang, Perl, Ansible to automate the tasks.
Encryption and it's types, decryption
Hey there! Follow the podcast if you like the episode
This is Tharun. In the Developer Tharun Podcast, I speak about Software Engineering
Thank you for Listening
And more...
link to the previous episode: https://anchor.fm/developertharun/episodes/1-How-does-WhatsApp-encrypt-end-to-end-backups---Part-1--A-system-perspective--Cryptography--Tharun-Shiv--About-encryption--Decryption-e1cgt6j
Hey there! Follow the podcast if you like the episode
This is Tharun. In the Developer Tharun Podcast, I speak about Software Engineering
Thank you for Listening
And more...
Hey there! Follow the podcast if you like the episode
This is Tharun. In the Developer Tharun Podcast, I speak about Software Engineering
Thank you for Listening
And more...
Hey there! Follow the podcast if you like the episode
This is Tharun. In the Developer Tharun Podcast, I speak about Software Engineering
Thank you for Listening
And more...
Hey there! Follow the podcast if you like the episode
This is Tharun. In the Developer Tharun Podcast, I speak about Software Engineering
Thank you for Listening
And more...
Vault stores data in encrypted format. The encryption key that is being used to encrypt/decrypt the data is also stored along with rest of the data in the keyring. When a Vault server starts, it knows where the data resides through the configuration that we provide Vault with but doesn't know how to decrypt the encryption key that is present in the keyring along with the Vault encrypted data.
Here comes the master key that is used to decrypt the encryption key which is also present alongside all other Vault data. This master key is also encrypted and we need a special key that can decrypt the master key, this key is known as Unseal key.
The Unseal key is generated during the init process using an algorithm known as 'Shamir's secret sharing', where the unseal key is split into certain number of unseal keys 'X' and every time we want to unseal the Vault server we will need a certain number of unseal keys 'Y' and these 'X' and 'Y' values can be decided by the Vault architect when initializing the Vault server.
The main intention of creating several unseal keys is to distribute these unseal keys among several stakeholders such that, a minimum number of stake holders are needed to unseal the server or perform major operations on the server.
What are policies?Policies help you create rules that define access to various secrets. We can create policies that allow certain level access like create access, update access, read access, delete access and so on. We then assign this policy to a particular authentication mechanism of a user. This user will have only those access mentioned in the policies attached to his credentials. This way, Vault makes sure that we provide minimal and only necessary access to Vault stakeholders.
Hey there! Follow the podcast if you like the episode
This is Tharun. In the Developer Tharun Podcast, I speak about Software Engineering
Thank you for Listening
And more...