[Hiring] Senior Site Reliability Engineer @Datavail
Senior Site Reliability Engineer @Datavail
Software Development
Salary unspecified
Remote Location
Employment Type full-time
Posted 3d ago

[Hiring] Senior Site Reliability Engineer @Datavail

3d ago - Datavail is hiring a remote Senior Site Reliability Engineer. πŸ’Έ Salary: unspecified πŸ“Location: Colombia

Role Description

  • Define and maintain SLIs/SLOs, monitor alignment and error budget usage
  • Lead incident response and postmortems, implement corrective measures
  • Automate operations tasks via tooling (e.g. auto-remediation, scaling rules)
  • Build, improve, and maintain CI/CD pipelines, canary deployments, blue/green strategies
  • Lead technical discussions with customers to align on reliability, scalability, and performance requirements
  • Drive continuous platform improvements across the service lifecycle, including architecture, monitoring, and operational processes
  • Implement and extend observability systems (metrics, tracing, log aggregation)
  • Optimize performance and cost by tuning cloud services, autoscaling, resource rightsizing
  • Design, deploy, and operate containerized workloads using Docker and Kubernetes in production environments
  • Collaborate with dev teams to integrate resilience patterns (circuit breakers, bulkheading)
  • Participate in architecture discussions around high availability, disaster recovery
  • Mentor mid and junior SREs; conduct reliability design reviews

Qualifications

  • 5–8 years of experience in a reliability or operations role
  • Cloud-agnostic certification: Terraform Associate, Certified Kubernetes Administrator (CKA), or SRE Foundation
  • Cloud provider certification: Professional-level certification in AWS (Solutions Architect), Azure (Solutions Architect Expert), GCP (Professional Cloud Architect), or Oracle Cloud (Architect Professional)
  • Solid coding skills (Python, Go, or equivalent)
  • Experience with IaC, CI/CD pipelines, and monitoring/observability stacks (Prometheus, Grafana, OpenTelemetry, ELK)
  • Comfortable with observability stacks (Prometheus, Grafana, OpenTelemetry, ELK, Jaeger)
  • Experience working in distributed systems and production scale services

Requirements

  • Nice-to-have Skills
  • Exposure to multi-cloud data replication or cross-cloud networks
  • Experience with chaos engineering or fault injection
Before You Apply
️
remote Be aware of the location restriction for this remote position: Colombia
β€Ό Beware of scams! When applying for jobs, you should NEVER have to pay anything. Learn more.
Senior Site Reliability Engineer @Datavail
Software Development
Salary unspecified
Remote Location
Employment Type full-time
Posted 3d ago
Apply for this position
Did not apply βœ“
Applied βœ“
Sent Follow-Up βœ“
Interview Scheduled βœ“
Interview Completed βœ“
Offer Accepted βœ“
Offer Declined βœ“
Application Denied βœ“
Unlock 155,000+ Remote Jobs
️
remote Be aware of the location restriction for this remote position: Colombia
β€Ό Beware of scams! When applying for jobs, you should NEVER have to pay anything. Learn more.
Apply for this position
Did not apply βœ“
Applied βœ“
Sent Follow-Up βœ“
Interview Scheduled βœ“
Interview Completed βœ“
Offer Accepted βœ“
Offer Declined βœ“
Application Denied βœ“
Unlock 155,000+ Remote Jobs
Γ—

Apply to the best remote jobs
before everyone else

Access 155,000+ vetted remote jobs and get daily alerts.

4.9 β˜…β˜…β˜…β˜…β˜… from 500+ reviews
Unlock All Jobs Now

Maybe later