Site Reliability Engineer @Pexa

[Hiring] Site Reliability Engineer @Pexa

Apr 01, 2025 - Pexa is hiring a remote Site Reliability Engineer. đź’¸ Salary: unspecified. đź“ŤLocation: UK.

This description is a summary of our understanding of the job description. Click on 'Apply' button to find out more.

Role Description

The Site Reliability Engineer is responsible for the technical support and operation of UK Platforms (both from an application and infrastructure perspective) by actively managing all incidents to resolution and supporting software releases. The role endeavours to make sure that PEXA Groups support offering for our platform adheres to the highest level of operational and security requirements but at the same time deliver a seamless and secure support experience to our customers.

The role is also responsible for additional activities including (but not limited to):

  • Application (E.g. SWIFT SILs), OS and Infrastructure patching
  • DR testing
  • Creation of alerting and monitoring
  • Service transition activities – knowledge transfer, operation playbook updates/knowledge articles update

The SRE will closely collaborate with the customer support team and the product development squads in various global locations to achieve the best outcome for the technical support of PEXA’s customers and integrated partners as well as working closely with PEXA AU run teams to ensure alignment of PEXA’s strategic direction of creating a consistent and “best in class” support experience for PEXA’s customers globally.

Overall, this role follows through on the vision and execution of the technical support function, is the contact point for technical incidents as well as for the support teams.

Key Accountabilities

  • Ensure high availability and reliability of UK platforms with day-to-day support.
  • Manage incidents with rapid resolution, root cause analysis, and post-mortems to prevent recurrence.
  • Optimise monitoring and alerting to enable proactive issue detection and fast response.
  • Identify process improvements and suggest service management enhancements for long-term stability.
  • Report problems, risks, issues, and change requests to minimise downtime.
  • Coordinate resolution and escalation of Platform Services issues, fostering cross-team collaboration.
  • Manage the Production environment, overseeing incidents, fixes, performance, and stability.
  • Drive continuous improvement by automating processes and enhancing operational performance.
  • Help define the cloud platform service roadmap to enhance system reliability.
  • Collaborate with UK Support and Delivery Squads to address pain points and add value.
  • Assist squads in estimating and resolving Platform Defects that cause incidents.
  • Oversee application, OS, and infrastructure patching, DR testing, monitoring setup, KT, and updating operational playbooks and knowledge articles.

Knowledge & Skills

  • Distributed systems in AWS and/or Azure cloud environments
  • Bring a developer mindset to platform challenges, understanding how software and infrastructure are designed, implemented, and integrated.
  • Strong knowledge of container orchestration and scaling, with experience in managing and troubleshooting workloads.
  • Experience of managing Kubernetes clusters, service mesh and hosted workloads
  • Proficient in observability and monitoring tools, including configuring alerts, creating dashboards, and conducting root cause analysis. Some of the tools we use are: Grafana, Prometheus, Elastic, Splunk.
  • Configuring incident management platforms such as PagerDuty.
  • Hands-on experience with Infrastructure-as-Code (IaC) and automation to improve operational efficiency, using tools like Terraform, Bicep or CloudFormation.
  • Strong understanding of modern SDLC and CI/CD processes, with experience in scripting, automation and version control systems such as Git.
  • Collaborating in DevSecOps upholding security best practices and compliance standards. Understanding of security frameworks such as Azure or AWS Well-Architected Frameworks.
  • Experience in high availability (HA) and disaster recovery (DR) strategies and execution.
  • Adept at collaborating with diverse teams across cultures and working effectively under pressure.
  • Empathetic team player who builds strong relationships, tackles challenges, and delivers results while maintaining quality and team morale.
  • Strong understanding of Agile principles, excellent communication skills, and a customer-centric mindset.

GDPR Compliance

Digital Completion UK Limited (trading name “PEXA”), Optima Legal Services Limited (trading name "Optima Legal") and Smoove Limited (a holding company which comprises of the following wholly owned trading Subsidiary companies: United Legal Services Limited, United Home Services Limited, Legal-Eye Limited, and Amity Law Limited) are all owned directly by DigCom UK Holdings Limited, which is a wholly owned Subsidiary of PEXA Group Limited in Australia (ACN 140 677 792; ASX: PXA) (referred to collectively as “PEXA Group”).

When we process your applicant personal data for recruitment purposes, we do so as a controller. If as part of the recruitment process, we share your personal data with another company within the PEXA Group, that company may process your personal data as either an independent controller or, in certain circumstances, a joint controller. By applying for this role, you consent to us processing your personal data in accordance with the UK General Data Protection Regulation ("UK GDPR") and the Data Protection Act 2018, and further information can be found in our privacy notice here .

Similar Remote Jobs

More jobs at Pexa

More Devops / Sysadmin jobs

More jobs in UK

Before You Apply
️
đź“Ť Be aware of the location restriction for this remote position: UK
‼ Beware of scams! When applying for jobs, you should NEVER have to pay anything. Learn more.
Site Reliability Engineer @Pexa
Devops / Sysadmin
Salary đź’¸ unspecified
Remote Location
UK
Job Type full-time
Posted Apr 01, 2025
Apply for this position Unlock 54,766 Remote Jobs
️
đź“Ť Be aware of the location restriction for this remote position: UK
‼ Beware of scams! When applying for jobs, you should NEVER have to pay anything. Learn more.
Site Reliability Engineer Apply for this position Unlock 54,766 Remote Jobs
Ă—
  • Unlock 54,766 hidden remote jobs.
  • Your shortcut to remote work. Apply before everyone else.
  • Click and apply. No middlemen, no hassle.

We’re not like the other sites. Come see why!

50% off in April 2025
  • Single payment
  • Lifetime access
  • Filter by location/skills/salary…
  • Create custom email alerts
  • Private Slack Community