Principal Cloud Site Reliability Engineer @cyberu
DevOps / Sysadmin
Salary gbp 64,600 - 10..
Remote Location
remote UK
Job Type full-time
Posted 2d ago

[Hiring] Principal Cloud Site Reliability Engineer @cyberu

2d ago - cyberu is hiring a remote Principal Cloud Site Reliability Engineer. πŸ’Έ Salary: gbp 64,600 - 103,400 per year πŸ“Location: UK

Role Description

We are seeking a Principal Cloud Site Reliability Engineer with strong Incident Management, Kubernetes, and Terraform expertise to ensure the reliability, scalability, and operational excellence of our production platforms. The ideal candidate will combine software engineering, infrastructure automation, and operational excellence to maintain highly available systems while leading and coordinating responses to critical production incidents.

This role requires someone comfortable operating in high-availability cloud environments, managing large-scale distributed systems, and driving incident response, post-incident analysis, and reliability improvements.

Responsibilities

  • Site Reliability Engineering
    • Maintain and improve system reliability, scalability, and performance for production environments.
    • Implement Infrastructure as Code (IaC) using Terraform to manage and automate cloud infrastructure.
    • Design, deploy, and operate Kubernetes clusters and containerized workloads.
    • Build and maintain observability frameworks including monitoring, logging, and alerting.
    • Automate operational tasks to reduce manual interventions and improve system resilience.
  • Incident Management
    • Lead and coordinate Major Incident Management (MIM) during production outages.
    • Act as Incident Commander or technical lead during high severity incidents.
    • Facilitate incident triage, mitigation, communication, and resolution across engineering teams.
    • Drive Root Cause Analysis (RCA) and ensure corrective and preventive actions are implemented.
    • Develop and improve runbooks, playbooks, and operational procedures.
  • Platform & Cloud Operations
    • Manage cloud infrastructure on platforms such as AWS, Azure, or GCP.
    • Optimize cluster performance, scaling, and availability in Kubernetes environments.
    • Implement high availability and disaster recovery strategies.
    • Support CI/CD pipelines and deployment automation.
  • Reliability & Engineering Excellence
    • Define and monitor SLIs, SLOs, and error budgets.
    • Implement proactive reliability improvements and capacity planning.
    • Collaborate with development teams to improve application resilience and observability.
    • Advocate for DevOps and SRE best practices across engineering teams.

Qualifications

  • 5+ years of experience in Site Reliability Engineering, DevOps, or Cloud Infrastructure.
  • Strong experience with Terraform (Infrastructure as Code).
  • Hands-on experience with Kubernetes (EKS, AKS, GKE, or self-managed clusters).
  • Experience with Major Incident Management and production incident response.
  • Strong knowledge of Linux systems and networking fundamentals.
  • Experience with cloud platforms (AWS preferred).
  • Familiarity with monitoring tools such as Prometheus, Grafana, Datadog, or ELK.
  • Experience with CI/CD tools such as Jenkins, GitHub Actions, GitLab CI, or similar.
  • Strong scripting skills in Python, Bash, or Go.

Preferred Qualifications

  • Experience managing large-scale distributed systems in production.
  • Experience implementing chaos engineering or resilience testing.
  • Knowledge of security best practices in cloud-native environments.

Benefits

  • Comprehensive compensation package including annual bonuses and program-specific awards.
  • Flexible and empowering career opportunities.
  • Total Rewards strategy based on equitable pay, market-driven research, and skill-based appraisals.
  • BASE salary range for this position: 64600 - 103400 GBP.
Before You Apply
️
remote Be aware of the location restriction for this remote position: UK
β€Ό Beware of scams! When applying for jobs, you should NEVER have to pay anything. Learn more.
Principal Cloud Site Reliability Engineer @cyberu
DevOps / Sysadmin
Salary gbp 64,600 - 10..
Remote Location
remote UK
Job Type full-time
Posted 2d ago
Apply for this position
Did not apply βœ“
Applied βœ“
Sent Follow-Up βœ“
Interview Scheduled βœ“
Interview Completed βœ“
Offer Accepted βœ“
Offer Declined βœ“
Unlock 152,720 Remote Jobs
️
remote Be aware of the location restriction for this remote position: UK
β€Ό Beware of scams! When applying for jobs, you should NEVER have to pay anything. Learn more.
Apply for this position
Did not apply βœ“
Applied βœ“
Sent Follow-Up βœ“
Interview Scheduled βœ“
Interview Completed βœ“
Offer Accepted βœ“
Offer Declined βœ“
Unlock 152,720 Remote Jobs
Γ—

Apply to the best remote jobs
before everyone else

Access 152,720+ vetted remote jobs and get daily alerts.

4.9 β˜…β˜…β˜…β˜…β˜… from 500+ reviews
Unlock All Jobs Now

Maybe later