Principal Site Reliability Engineer @OutSystems

DevOps / Sysadmin

Salary unspecified	Remote Location Portugal
Job Type full-time	Posted 2wks ago

[Hiring] Principal Site Reliability Engineer @OutSystems

2wks ago - OutSystems is hiring a remote Principal Site Reliability Engineer. 💸 Salary: unspecified 📍Location: Portugal

Role Description

Site Reliability Engineering (SRE) is a discipline that incorporates aspects of software engineering and applies them to infrastructure and operations problems. The main goals of SRE are to create scalable and highly reliable systems. Our SREs ensure our production systems' reliability, performance, and scalability while enabling rapid development and deployment of new features and services.

SREs at OutSystems work closely with development teams, acting as an extension of the team, in adopting the reliability tenets with the shared goal of meeting Service Level Objectives (SLOs) and thus delivering a smooth and frictionless Customer Experience.

What you'll do

Help define and execute the strategic vision and roadmap for the Site Reliability Engineering function.
Provide leadership and mentorship to more junior SREs, fostering a culture of innovation, collaboration, and operational excellence.
Collaborate with leadership and other stakeholders to ensure cross-functional alignment.
Take active participation, collaborate effectively with development teams, and influence the design of a highly reliable and scalable infrastructure, leveraging cloud technologies and industry best practices.
Collaborate with development teams at all stages of the product development lifecycle to ensure systems are resilient (observable, fault-tolerant, recoverable, scalable) and performant.
Drive the adoption, definition, and improvement of Service Level Objectives (SLOs).
Implement monitoring, alerting, logging, and tracing solutions to detect and respond to incidents.
Oversee incident response efforts, ensuring quick resolution and minimal downtime, and effective RCA/post-mortems.
Automate every operational task, with a special focus on fast incident detection & recovery.
Foster a culture of continuous improvement and knowledge sharing.
Communicate effectively with stakeholders, providing updates on system reliability and performance.
Champion reliability as a core product feature, not an afterthought.

Qualifications

STEM degree (BSc, MSc, in Software Engineering/Computer Science or related fields).
8+ years of experience in Software Engineering or SRE, ideally within high-growth, cloud-native environments.
Expertise in Observability: Proven ability to implement SLIs/SLOs and telemetry systems that provide actionable insights into complex distributed systems.
Cloud Mastery: Deep architectural knowledge of AWS/GCP/Azure, specifically regarding networking, security, and cost-optimization.
Strategic Impact: Demonstrated success in leading cross-functional initiatives that improved system reliability or developer velocity at an organizational scale.
System Design & Architecture: Expertise in designing highly available, fault-tolerant distributed systems (Microservices, Event-driven architecture).
Development: Professional-level proficiency in Go, Python, or Rust, with the ability to contribute to core product codebases and build custom internal tooling.
Cloud Ecosystems: Deep-tier mastery of AWS, GCP, or Azure (specifically IAM, VPC networking, Transit Gateways, and Cross-region redundancy).
Orchestration at Scale: Extensive experience managing Kubernetes (K8s) in production, including Custom Resource Definitions (CRDs), Service Mesh (Istio/Linkerd), and Admission Controllers.
Infrastructure as Code (IaC): Advanced usage of Terraform, CloudFormation, or Spacelift, focusing on modularity, state management, and CI/CD integration for infrastructure.

Benefits

A company at the vanguard of the agentic revolution, where we don’t just react to AI innovation—we architect it.
Real growth opportunities through structured programs designed to scale your expertise.
A global collective of world-class talent, collaborating with enterprise software legends and sought-after thought leaders.
An inclusive culture where talented individuals from all backgrounds are empowered to learn, experiment, and make an impact.

Similar Remote Jobs

Senior DevOps Engineer • Marketerx Marketerx

DevOps / Sysadmin $130k - $150k USA Only

2wks ago
Apply See more >

Kickstart Your Job Search

⚡ 12,726 remote jobs added this week

You're seeing 0.4% of available roles

Unlock 152,720 jobs →

Meet JobCopilot: Your Personal Al Job Hunter

Automatically Apply to Remote Jobs

Try it now →

Before You Apply

️

	Be aware of the location restriction for this remote position: Portugal
‼	Beware of scams! When applying for jobs, you should NEVER have to pay anything. Learn more.

Back to Remote jobs > DevOps / Sysadmin

Principal Site Reliability Engineer @OutSystems

DevOps / Sysadmin

Salary unspecified	Remote Location Portugal
Job Type full-time	Posted 2wks ago

Apply for this position

Unlock 152,720 Remote Jobs

️

	Be aware of the location restriction for this remote position: Portugal
‼	Beware of scams! When applying for jobs, you should NEVER have to pay anything. Learn more.

Apply for this position

Unlock 152,720 Remote Jobs

[Hiring] Principal Site Reliability Engineer @OutSystems

What you'll do

Apply to the best remote jobsbefore everyone else

Apply to the best remote jobs
before everyone else