Principal Site Reliability Engineer @OutSystems
DevOps / Sysadmin
Salary unspecified
Remote Location
Job Type full-time
Posted 2wks ago

[Hiring] Principal Site Reliability Engineer @OutSystems

2wks ago - OutSystems is hiring a remote Principal Site Reliability Engineer. 💸 Salary: unspecified 📍Location: Portugal

Role Description

Site Reliability Engineering (SRE) is a discipline that incorporates aspects of software engineering and applies them to infrastructure and operations problems. The main goals of SRE are to create scalable and highly reliable systems. Our SREs ensure our production systems' reliability, performance, and scalability while enabling rapid development and deployment of new features and services.

SREs at OutSystems work closely with development teams, acting as an extension of the team, in adopting the reliability tenets with the shared goal of meeting Service Level Objectives (SLOs) and thus delivering a smooth and frictionless Customer Experience.

What you'll do

  • Help define and execute the strategic vision and roadmap for the Site Reliability Engineering function.
  • Provide leadership and mentorship to more junior SREs, fostering a culture of innovation, collaboration, and operational excellence.
  • Collaborate with leadership and other stakeholders to ensure cross-functional alignment.
  • Take active participation, collaborate effectively with development teams, and influence the design of a highly reliable and scalable infrastructure, leveraging cloud technologies and industry best practices.
  • Collaborate with development teams at all stages of the product development lifecycle to ensure systems are resilient (observable, fault-tolerant, recoverable, scalable) and performant.
  • Drive the adoption, definition, and improvement of Service Level Objectives (SLOs).
  • Implement monitoring, alerting, logging, and tracing solutions to detect and respond to incidents.
  • Oversee incident response efforts, ensuring quick resolution and minimal downtime, and effective RCA/post-mortems.
  • Automate every operational task, with a special focus on fast incident detection & recovery.
  • Foster a culture of continuous improvement and knowledge sharing.
  • Communicate effectively with stakeholders, providing updates on system reliability and performance.
  • Champion reliability as a core product feature, not an afterthought.

Qualifications

  • STEM degree (BSc, MSc, in Software Engineering/Computer Science or related fields).
  • 8+ years of experience in Software Engineering or SRE, ideally within high-growth, cloud-native environments.
  • Expertise in Observability: Proven ability to implement SLIs/SLOs and telemetry systems that provide actionable insights into complex distributed systems.
  • Cloud Mastery: Deep architectural knowledge of AWS/GCP/Azure, specifically regarding networking, security, and cost-optimization.
  • Strategic Impact: Demonstrated success in leading cross-functional initiatives that improved system reliability or developer velocity at an organizational scale.
  • System Design & Architecture: Expertise in designing highly available, fault-tolerant distributed systems (Microservices, Event-driven architecture).
  • Development: Professional-level proficiency in Go, Python, or Rust, with the ability to contribute to core product codebases and build custom internal tooling.
  • Cloud Ecosystems: Deep-tier mastery of AWS, GCP, or Azure (specifically IAM, VPC networking, Transit Gateways, and Cross-region redundancy).
  • Orchestration at Scale: Extensive experience managing Kubernetes (K8s) in production, including Custom Resource Definitions (CRDs), Service Mesh (Istio/Linkerd), and Admission Controllers.
  • Infrastructure as Code (IaC): Advanced usage of Terraform, CloudFormation, or Spacelift, focusing on modularity, state management, and CI/CD integration for infrastructure.

Benefits

  • A company at the vanguard of the agentic revolution, where we don’t just react to AI innovation—we architect it.
  • Real growth opportunities through structured programs designed to scale your expertise.
  • A global collective of world-class talent, collaborating with enterprise software legends and sought-after thought leaders.
  • An inclusive culture where talented individuals from all backgrounds are empowered to learn, experiment, and make an impact.
Before You Apply
remote Be aware of the location restriction for this remote position: Portugal
Beware of scams! When applying for jobs, you should NEVER have to pay anything. Learn more.
Principal Site Reliability Engineer @OutSystems
DevOps / Sysadmin
Salary unspecified
Remote Location
Job Type full-time
Posted 2wks ago
Apply for this position
Did not apply
Applied
Sent Follow-Up
Interview Scheduled
Interview Completed
Offer Accepted
Offer Declined
Unlock 152,720 Remote Jobs
remote Be aware of the location restriction for this remote position: Portugal
Beware of scams! When applying for jobs, you should NEVER have to pay anything. Learn more.
Apply for this position
Did not apply
Applied
Sent Follow-Up
Interview Scheduled
Interview Completed
Offer Accepted
Offer Declined
Unlock 152,720 Remote Jobs
×

Apply to the best remote jobs
before everyone else

Access 152,720+ vetted remote jobs and get daily alerts.

4.9 ★★★★★ from 500+ reviews
Unlock All Jobs Now

Maybe later