Senior Site Reliability Engineer @RapidSOS
DevOps / Sysadmin
Salary $160,000 - $195..
Remote Location
🇺🇸 USA Only
Job Type full-time
Posted 2d ago

[Hiring] Senior Site Reliability Engineer @RapidSOS

2d ago - RapidSOS is hiring a remote Senior Site Reliability Engineer. 💸 Salary: $160,000 - $195,000 📍Location: USA

Role Description

Are you excited to work on systems where reliability directly impacts real-world outcomes? At RapidSOS, we build technology that powers emergency response, ensuring critical data gets to the right place at the right time. When these systems degrade or fail, the impact is real and reliability isn’t a background function. It’s fundamental to how our product shows up in critical moments.

We’re seeking a Senior Site Reliability Engineer to own the performance and stability of services that operate at scale in real-world, high-stakes environments. You’ll work across infrastructure-as-code, container orchestration, CI/CD pipelines, and service-level application code, identifying and resolving issues at their root cause while proactively shaping how systems are built to improve reliability from the start. You’ll go beyond surface-level fixes, digging into everything from service behavior in Kubernetes to application-level decisions that impact performance, cost, and reliability. You’ll collaborate closely with engineering teams to improve how our systems are built, observed, and operated. Along the way, you’ll help shape how we approach reliability as a discipline—closing visibility gaps, improving resilience, and ensuring our platform performs when it matters most.

What you’ll do:

  • Own performance and reliability outcomes:
    • Ownership of how application-level decisions create system-level impact, including connection pooling, database architecture, traffic routing patterns, and memory allocation.
    • Collaboration with engineering teams that own specific domains, partnering directly to improve reliability and performance across their systems.
  • Design for system resilience:
    • Responsibility for strengthening reliability through proactive design decisions, including safer deployment patterns, failover strategies, and redundancy approaches that improve system behavior under stress.
  • Build observability into system behavior:
    • Proactively instrument services with structured logging, metrics, and alerting so systems are easier to understand and debug.
    • The focus is on creating clear signals from production behavior before issues escalate.
  • Own incidents from signal to resolution:
    • Ownership of production issues from first signal through resolution, including investigation across infrastructure and application layers, root cause identification, and implementation of fixes that restore stability and strengthen system behavior long term.
  • Work across the stack without a permission slip:
    • You’ll work across infrastructure-as-code, container orchestration, CI/CD pipelines, and service-level application code.
    • When issues come up, you don’t wait for a handoff—ownership is taken directly and driven through to resolution.

Qualifications

  • 5+ years of professional engineering experience with deep expertise in Python.
  • Real cloud infrastructure experience with AWS: networking, managed databases, cost implications of traffic routing decisions, IAM, DNS-based routing and failover.
  • Hands-on Kubernetes experience with containerized workloads in production across EKS, ECS, or Fargate.
  • Strong understanding of distributed systems and how they fail, including resource exhaustion, replication lag, queue backpressure, and other common failure modes.
  • Experience operating high-throughput messaging systems (RabbitMQ, Kafka, AWS SNS / SQS, etc.) and the infrastructure around them.
  • Experience building or improving observability through logging, metrics, and alerting.
  • Demonstrable experience in using AI to safely and securely enhance velocity, improve reliability and recoverability of services.
  • Strong communication and interpersonal skills; is a team player with a positive attitude.
  • Highly self-motivated; ability to adapt and learn quickly in a fast-paced environment with a strong sense of ownership.
  • Strong proficiency in coding best practices – ability to write clean, maintainable, and testable code.
  • Demonstrated expertise in problem solving – comfortable working across both infrastructure and application layers to diagnose and resolve issues at the source.
  • Ability and willingness to collaborate in-person a few times per quarter, or as needed.

Requirements

  • Nice-to-have experience (but not required!):
    • Experience supporting production systems in an on-call or similar capacity where reliability matters.
    • Experience with observability and GitOps tooling; hands-on with Datadog (APM, alerting), Elasticsearch/OpenSearch, and ArgoCD-based GitOps deployments.
    • Comfortable modernizing legacy CI/CD pipelines (e.g., Concourse, Jenkins) toward cloud-native approaches.

Benefits

  • The chance to work with a passionate team on solving one of the largest challenges globally.
  • Competitive salary and benefits and equity participation.
  • A dynamic, flexible and fun start-up work environment with a highly talented team.
Before You Apply
🇺🇸 Be aware of the location restriction for this remote position: USA Only
Beware of scams! When applying for jobs, you should NEVER have to pay anything. Learn more.
Senior Site Reliability Engineer @RapidSOS
DevOps / Sysadmin
Salary $160,000 - $195..
Remote Location
🇺🇸 USA Only
Job Type full-time
Posted 2d ago
Apply for this position
Did not apply
Applied
Sent Follow-Up
Interview Scheduled
Interview Completed
Offer Accepted
Offer Declined
Unlock 152,720 Remote Jobs
🇺🇸 Be aware of the location restriction for this remote position: USA Only
Beware of scams! When applying for jobs, you should NEVER have to pay anything. Learn more.
Apply for this position
Did not apply
Applied
Sent Follow-Up
Interview Scheduled
Interview Completed
Offer Accepted
Offer Declined
Unlock 152,720 Remote Jobs
×

Apply to the best remote jobs
before everyone else

Access 152,720+ vetted remote jobs and get daily alerts.

4.9 ★★★★★ from 500+ reviews
Unlock All Jobs Now

Maybe later