[Hiring] Lead Site Reliability Engineer @athenahealth
Lead Site Reliability Engineer @athenahealth
Devops
Salary usd 119,000 - 2..
Remote Location
๐Ÿ‡บ๐Ÿ‡ธ USA Only
Employment Type full-time
Posted 1mth ago

[Hiring] Lead Site Reliability Engineer @athenahealth

1mth ago - athenahealth is hiring a remote Lead Site Reliability Engineer. ๐Ÿ’ธ Salary: usd 119,000 - 203,000 per year ๐Ÿ“Location: USA

Role Description

We are looking for a Lead Site Reliability Engineer to join our Cloud Engineering division. Cloud Engineering ensures the continuous availability of the technologies and systems that are the foundation of athenahealthโ€™s services.

  • Define, measure, and maintain Service Level Objectives (SLOs) and Service Level Indicators (SLIs) for cloud services and infrastructure components.
  • Lead efforts to continuously improve system availability, fault tolerance, and disaster recovery capabilities.
  • Ensure proactive incident detection, efficient root cause analysis, and timely resolution of production incidents.
  • Participate in a 12x7 on-call rotation.
  • Drive automation efforts to reduce manual intervention and streamline cloud infrastructure management.
  • Implement Infrastructure as Code (IaC) using tools like Terraform, AWS CloudFormation, and Ansible.
  • Automate deployment, scaling, and monitoring processes to improve efficiency and reduce operational complexity.
  • Design and implement monitoring, logging, and alerting solutions to track cloud infrastructure health, performance, and security.
  • Use observability tools (e.g., Prometheus, Grafana, Cloud Watch) to ensure continuous visibility into cloud infrastructure performance and capacity.
  • Identify bottlenecks and performance issues, proposing and implementing improvements to ensure optimal resource usage.
  • Ensure that cloud infrastructure is built with security best practices in mind and meets all relevant compliance and regulatory requirements.
  • Collaborate with security teams to implement security controls and risk mitigation strategies across cloud environments.
  • Regularly audit and review cloud infrastructure for security vulnerabilities and compliance gaps.
  • Work closely with development, DevOps, and operations teams to ensure cloud infrastructure aligns with application and business requirements.
  • Lead and mentor a team of Site Reliability Engineers, promoting best practices and fostering a culture of operational excellence.
  • Act as a key technical point of contact for cloud-related infrastructure and operations issues.
  • Lead the incident response efforts for cloud infrastructure-related issues.
  • Conduct post-incident reviews (PIRs) to identify root causes and implement preventive measures.
  • Continuously refine incident management processes to reduce downtime and enhance recovery times.

Qualifications

  • 10 years of hands-on experience with cloud automation and configuration management tools (e.g., Terraform, AWS CloudFormation, Ansible, Puppet).
  • 7+ years of experience in a Site Reliability Engineering (SRE), Infrastructure Engineering, or DevOps role, with at least 3+ years in a technical leadership capacity.
  • Deep knowledge of cloud services and technologies (e.g., EC2, S3, Lambda, Kubernetes, etc.).
  • Proficiency in scripting or programming languages (Python, Go, Bash, etc.).
  • Experience with monitoring, logging, and observability tools (e.g., Prometheus, Grafana, Datadog, ELK stack).
  • Familiarity with Continuous Integration/Continuous Deployment (CI/CD) pipelines and cloud-native development practices.
  • Strong expertise in managing cloud infrastructure (AWS, Google Cloud, Azure) in production environments.
  • Experience with cloud-native architectures, microservices, and containerized environments (Kubernetes, Docker).
  • Proven experience in building and managing highly available, scalable, and fault-tolerant systems in the cloud.
  • Strong understanding of cloud networking, storage, compute services, On-Prem and security best practices.
  • Strong knowledge of Linux administration and internals.
  • Effective communication skills, with the ability to translate technical concepts to non-technical stakeholders.

Preferred Qualifications

  • Bachelorโ€™s degree in Computer Science, Engineering, or a related field.
  • Knowledge of database systems such as MySQL, Oracle or PostgreSQL.
  • Experience with managing on-prem infrastructure at scale.
  • Certifications in AWS, RedHat5 or relevant technologies are a plus.
  • Experience running containerized workloads (Kubernetes, Docker) in production.

Expected Compensation

$119,000 - $203,000. The base salary range shown reflects the full range for this role from minimum to maximum. At athenahealth, base pay depends on multiple factors, including job-related experience, relevant knowledge and skills, how your qualifications compare to others in similar roles, and geographical market rates.

Benefits

  • Health and financial benefits.
  • Perks specific to each location, including commuter support, employee assistance programs, tuition assistance, and employee resource groups.
  • Flexible work-life balance with options for remote work.
  • Events throughout the year, including book clubs, external speakers, and hackathons.
Before You Apply
๏ธ
๐Ÿ‡บ๐Ÿ‡ธ Be aware of the location restriction for this remote position: USA Only
โ€ผ Beware of scams! When applying for jobs, you should NEVER have to pay anything. Learn more.
Lead Site Reliability Engineer @athenahealth
Devops
Salary usd 119,000 - 2..
Remote Location
๐Ÿ‡บ๐Ÿ‡ธ USA Only
Employment Type full-time
Posted 1mth ago
Apply for this position
Did not apply โœ“
Applied โœ“
Sent Follow-Up โœ“
Interview Scheduled โœ“
Interview Completed โœ“
Offer Accepted โœ“
Offer Declined โœ“
Application Denied โœ“
Unlock 160,000+ Remote Jobs
๏ธ
๐Ÿ‡บ๐Ÿ‡ธ Be aware of the location restriction for this remote position: USA Only
โ€ผ Beware of scams! When applying for jobs, you should NEVER have to pay anything. Learn more.
Apply for this position
Did not apply โœ“
Applied โœ“
Sent Follow-Up โœ“
Interview Scheduled โœ“
Interview Completed โœ“
Offer Accepted โœ“
Offer Declined โœ“
Application Denied โœ“
Unlock 160,000+ Remote Jobs
ร—

Apply to the best remote jobs
before everyone else

Access 160,000+ vetted remote jobs and get daily alerts.

4.9 โ˜…โ˜…โ˜…โ˜…โ˜… from 500+ reviews
Unlock All Jobs Now

Maybe later