Role Description
As a Site Reliability Engineer, you will play a critical role in ensuring the availability and performance of our customer-facing platform. You will work closely with DevOps, DBA, and Development teams to provision and maintain infrastructure, deploy and monitor our applications, and automate workflows. Your contributions will have a direct impact on customer satisfaction and overall experience.
Responsibilities and Deliverables
-
Manage, monitor, and maintain highly available systems (Windows and Linux)
-
Analyze metrics and trends to ensure rapid scalability
-
Address routine service requests while identifying ways to automate and simplify
-
Create infrastructure as code using Terraform, ARM Templates, Cloud Formation
-
Maintain data backups and disaster recovery plans
-
Design and deploy CI/CD pipelines using GitHub Actions, Octopus, Ansible, Jenkins, Azure DevOps
-
Adhere to security best practices through all stages of the software development lifecycle
-
Follow and champion ITIL best practices and standards
-
Become a resource for emerging and existing cloud technologies with a focus on AWS
Organizational Alignment
-
Reports to the Senior SRE Manager
-
This role involves close collaboration with DevOps, DBA, and security teams
Technical Proficiencies
-
Hands-on experience with AWS is a must-have
-
Proficiency analyzing application, IIS, system, security logs and CloudTrail events
-
Practical experience with CI/CD tools such as GitHub Actions, Jenkins, Octopus
-
Experience with observability tools such as New Relic, Application Insights, AppDynamics, or DataDog
-
Experience maintaining and administering Windows, Linux, and Kubernetes
-
Experience in automation using scripting languages such as Bash, PowerShell, or Python
-
Configuration management experience using Ansible, Terraform, Azure Automation Run book or similar
-
Experience with SQL Server database maintenance and administration is preferred
-
Good understanding of networking (VNET, subnet, private link, VNET peering)
-
Familiarity with cloud concepts including certificates, Oauth, AzureAD, ASE, ASP, AKS, Azure Apps, Load Balancers, Application Gateway, Firewall, Load Balancer, API Management, SQL Server, Databases on Azure
Experience
-
5+ years of experience in SRE or System Administration role
-
Demonstrated ability building and supporting high availability Windows/Linux servers, with emphasis on the WISA stack (Windows/IIS/SQL Server/ASP.net)
-
3+ years of experience with CI/CD tools
-
3+ years of experience working with cloud technologies including AWS, Azure
-
1+ years of experience working with container technology including Docker and Kubernetes
-
Comfortable using Scrum, Kanban, or Lean methodologies
Education
-
Bachelorβs Degree or College Diploma in Computer Science, Information Systems, or equivalent experience