According to Ben Trenor “ SRE is “what happens when you ask a software engineer to design an operations function.” They stand at the crossroads of IT and development teams. They are generally comprised of software engineers who are tasked with creating software, deploying them, manage performance issues and ensure reliability of the systems.
An SRE team drives greater synchrony between operations and development team and ensures negation of time involved in support issues escalation while providing more time to the team to focus on better features and service building.
Below, I have summarised the main roles of a site reliable engineer
- Creating software or products for software building, delivery and incident management
With a deep knowledge of the operating systems and software development they help create solutions for faster software development. They write codes, run codes tests, monitor or induce code changes so that software delivery is super-fast, incident management is improved and others. They undertake post incident reviews, document the findings of the review and then take action on what they have found.
- Resolving issues related to support escalation
An SRE operations team helps to reduce critical incident issues and route the issues to the right people leading to real time resolution of escalation issues.
- Ensure reliability of the site, IT platforms and services
An SRE prioritizes work using Service Level objectives (SLO), Service level Indicators (SLIs) and SLAs (Service level agreements). Apart from working upon these metrics, the team also sets the error budget. With these metrics, they optimise the performance level, the latency level along with deciding the accepted performance downsizing with heavy loads and slow response time.
- Quality Assurance
The SRE team works on stringent product quality metrics and are tasked with detecting flaws in software functioning, detecting flaws in software deployment in production, design tests and forecast probable problem areas in quality.
SRE team is responsible for creating automated build triggers to implement automated build process including automated unit, function tests and automated deployment. They have a long experience across the entire software lifecycle from code development to publishing and deployment in production. This helps them to employ automation build.
- Documentation of tribal knowledge
With the exposure of SRE team in staging, production and collaboration with technical teams in all their activities, they are adept in creating a large amount of historical knowledge over time. Site Reliability engineers are required to document this knowledge so that it can help in the entire IT teams’ activities.
- On Call support
Site Reliability Engineers are often required to take on call responsibilities and may be responsible for adding automation and create context for alerts leading to real time collaborative response.
Well, Ismile Technologies is there to help you with their SRE team.