Senior Site Reliability Engineer @Climavision
All Others
Salary $135-170k annua..
Remote Location
πŸ‡ΊπŸ‡Έ USA Only
Employment Type full-time
Posted 2d ago

[Hiring] Senior Site Reliability Engineer @Climavision

2d ago - Climavision is hiring a remote Senior Site Reliability Engineer. πŸ’Έ Salary: $135-170k annually πŸ“Location: USA

Role Description

Climavision is seeking a Senior Site Reliability Engineer to contribute towards reliability, operational excellence, and production resilience for our customer-facing platform and weather data services. This role is focused on ensuring our systems consistently meet demanding customer SLAs, including a 99.5% availability commitment for radar-derived data services.

  • Own production reliability for Climavision's customer-facing platform and radar-derived weather data services across Azure, colocation, and edge Kubernetes environments.
  • Contribute to the definition and improvement of SLIs, SLOs, alerting standards, and operational metrics used to measure platform reliability.
  • Support and coordinate production incident response efforts, including troubleshooting, mitigation, communication, and postmortem analysis.
  • Diagnose and resolve complex production issues across application services, Kubernetes infrastructure, storage, and distributed systems.
  • Drive multi-replica and multi-cluster high availability across Climavision's .NET services.
  • Contribute to the multi-cluster high-availability strategy across Climavision's hybrid fleet.
  • Operate and improve Climavision's self-managed Kubernetes platform.
  • Ensure Kubernetes platform lifecycle activities are executed in a manner that preserves service availability.
  • Improve reliability and operational maturity of production platform services.
  • Design and validate Kubernetes workloads for resiliency, scalability, and operational efficiency.
  • Read, debug, and contribute production-quality C# / .NET code focused on reliability improvements.
  • Partner with software engineering teams to improve production readiness.
  • Maintain and improve deployment pipelines, Helm charts, Kubernetes manifests, and infrastructure automation.
  • Support and evolve Climavision's observability platform.
  • Conduct performance engineering and capacity-planning efforts for customer-facing services during peak weather-event demand.
  • Help facilitate blameless postmortem reviews and drive operational follow-up items through completion.
  • Improve disaster recovery, failover, and business continuity capabilities.
  • Drive operational excellence initiatives.
  • Contribute as a senior technical resource and mentor on reliability engineering and production operations practices.

Qualifications

  • A bachelor's degree in computer science, software engineering, or a related field; equivalent professional experience considered.
  • Minimum of 7 years of experience in Site Reliability Engineering, DevOps, Production Engineering, Platform Engineering, or a related infrastructure-focused role.
  • Strong, hands-on software engineering experience with a minimum of 3 years supporting and modifying C# / .NET applications in production environments.
  • Demonstrated experience refactoring production application code to make services horizontally scalable.
  • Experience designing or operating multi-cluster high-availability architectures.
  • Experience supporting customer-facing production systems with uptime, reliability, and incident-response responsibilities.
  • Strong hands-on experience operating production workloads in self-managed or highly customized Kubernetes environments.
  • Experience diagnosing and resolving production incidents across application, platform, and Kubernetes infrastructure layers.
  • Experience operating Kubernetes outside of strictly managed cloud environments.
  • Strong understanding of infrastructure automation and Infrastructure as Code concepts.
  • Experience supporting CI/CD and production deployment pipelines.
  • Experience with monitoring, logging, and observability platforms.
  • Strong troubleshooting skills across infrastructure, application, and platform layers.
  • Strong written and verbal communication skills.
  • Experience working in start-up, scale-up, or other fast-moving engineering environments.

Requirements

  • Experience operating Kubernetes platforms using RKE2 and Rancher.
  • Experience supporting hybrid cloud and colocation infrastructure environments.
  • Experience with service mesh technologies.
  • Experience with Kubernetes-native storage platforms.
  • Experience operating PostgreSQL or PostGIS in Kubernetes environments.
  • Experience with distributed messaging systems.
  • Experience supporting GPU-enabled workloads in Kubernetes.
  • Familiarity with reliability engineering practices.

Benefits

  • Benefits of a dynamic and growing organization.
  • A challenging, hands-on role that will have real impact on the business.
  • Competitive compensation.
  • Comprehensive benefits package.
  • 401(k) Savings Plan.
  • Medical/Dental/Vision Benefits.
  • Health Savings Account (HSA) and Flexible Spending Account (FSA).
  • Unlimited Paid Time-off.
  • 11 Paid Holidays.
  • Paid Parental Leave.
  • Company Paid Short-term Disability (STD).
  • Company Paid Long-term Disability (LTD).
  • Company Paid Life Insurance.
Before You Apply
️
πŸ‡ΊπŸ‡Έ Be aware of the location restriction for this remote position: USA Only
β€Ό Beware of scams! When applying for jobs, you should NEVER have to pay anything. Learn more.
Senior Site Reliability Engineer @Climavision
All Others
Salary $135-170k annua..
Remote Location
πŸ‡ΊπŸ‡Έ USA Only
Employment Type full-time
Posted 2d ago
Apply for this position
Did not apply βœ“
Applied βœ“
Sent Follow-Up βœ“
Interview Scheduled βœ“
Interview Completed βœ“
Offer Accepted βœ“
Offer Declined βœ“
Application Denied βœ“
Unlock 145,000+ Remote Jobs
️
πŸ‡ΊπŸ‡Έ Be aware of the location restriction for this remote position: USA Only
β€Ό Beware of scams! When applying for jobs, you should NEVER have to pay anything. Learn more.
Apply for this position
Did not apply βœ“
Applied βœ“
Sent Follow-Up βœ“
Interview Scheduled βœ“
Interview Completed βœ“
Offer Accepted βœ“
Offer Declined βœ“
Application Denied βœ“
Unlock 145,000+ Remote Jobs
Γ—

Apply to the best remote jobs
before everyone else

Access 145,000+ vetted remote jobs and get daily alerts.

4.9 β˜…β˜…β˜…β˜…β˜… from 500+ reviews
Unlock All Jobs Now

Maybe later