Back to Remote jobs > Devops > reliability engineer

Lead Site Reliability Engineer @athenahealth

Devops

Salary usd 119,000 - 2..	Remote Location 🇺🇸 USA Only
Employment Type full-time	Posted 1mth ago

[Hiring] Lead Site Reliability Engineer @athenahealth

1mth ago - athenahealth is hiring a remote Lead Site Reliability Engineer. 💸 Salary: usd 119,000 - 203,000 per year 📍Location: USA

Role Description

We are looking for a Lead Site Reliability Engineer to join our Cloud Engineering division. Cloud Engineering ensures the continuous availability of the technologies and systems that are the foundation of athenahealth’s services.

Define, measure, and maintain Service Level Objectives (SLOs) and Service Level Indicators (SLIs) for cloud services and infrastructure components.
Lead efforts to continuously improve system availability, fault tolerance, and disaster recovery capabilities.
Ensure proactive incident detection, efficient root cause analysis, and timely resolution of production incidents.
Participate in a 12x7 on-call rotation.
Drive automation efforts to reduce manual intervention and streamline cloud infrastructure management.
Implement Infrastructure as Code (IaC) using tools like Terraform, AWS CloudFormation, and Ansible.
Automate deployment, scaling, and monitoring processes to improve efficiency and reduce operational complexity.
Design and implement monitoring, logging, and alerting solutions to track cloud infrastructure health, performance, and security.
Use observability tools (e.g., Prometheus, Grafana, Cloud Watch) to ensure continuous visibility into cloud infrastructure performance and capacity.
Identify bottlenecks and performance issues, proposing and implementing improvements to ensure optimal resource usage.
Ensure that cloud infrastructure is built with security best practices in mind and meets all relevant compliance and regulatory requirements.
Collaborate with security teams to implement security controls and risk mitigation strategies across cloud environments.
Regularly audit and review cloud infrastructure for security vulnerabilities and compliance gaps.
Work closely with development, DevOps, and operations teams to ensure cloud infrastructure aligns with application and business requirements.
Lead and mentor a team of Site Reliability Engineers, promoting best practices and fostering a culture of operational excellence.
Act as a key technical point of contact for cloud-related infrastructure and operations issues.
Lead the incident response efforts for cloud infrastructure-related issues.
Conduct post-incident reviews (PIRs) to identify root causes and implement preventive measures.
Continuously refine incident management processes to reduce downtime and enhance recovery times.

Qualifications

10 years of hands-on experience with cloud automation and configuration management tools (e.g., Terraform, AWS CloudFormation, Ansible, Puppet).
7+ years of experience in a Site Reliability Engineering (SRE), Infrastructure Engineering, or DevOps role, with at least 3+ years in a technical leadership capacity.
Deep knowledge of cloud services and technologies (e.g., EC2, S3, Lambda, Kubernetes, etc.).
Proficiency in scripting or programming languages (Python, Go, Bash, etc.).
Experience with monitoring, logging, and observability tools (e.g., Prometheus, Grafana, Datadog, ELK stack).
Familiarity with Continuous Integration/Continuous Deployment (CI/CD) pipelines and cloud-native development practices.
Strong expertise in managing cloud infrastructure (AWS, Google Cloud, Azure) in production environments.
Experience with cloud-native architectures, microservices, and containerized environments (Kubernetes, Docker).
Proven experience in building and managing highly available, scalable, and fault-tolerant systems in the cloud.
Strong understanding of cloud networking, storage, compute services, On-Prem and security best practices.
Strong knowledge of Linux administration and internals.
Effective communication skills, with the ability to translate technical concepts to non-technical stakeholders.

Preferred Qualifications

Bachelor’s degree in Computer Science, Engineering, or a related field.
Knowledge of database systems such as MySQL, Oracle or PostgreSQL.
Experience with managing on-prem infrastructure at scale.
Certifications in AWS, RedHat5 or relevant technologies are a plus.
Experience running containerized workloads (Kubernetes, Docker) in production.

Expected Compensation

$119,000 - $203,000. The base salary range shown reflects the full range for this role from minimum to maximum. At athenahealth, base pay depends on multiple factors, including job-related experience, relevant knowledge and skills, how your qualifications compare to others in similar roles, and geographical market rates.

Benefits

Health and financial benefits.
Perks specific to each location, including commuter support, employee assistance programs, tuition assistance, and employee resource groups.
Flexible work-life balance with options for remote work.
Events throughout the year, including book clubs, external speakers, and hackathons.

Similar Remote Jobs

Senior DevOps Engineer • Lemon.io Lemon.io

Devops Americas Europe Asia Oceania

1wk ago
Apply See more >

Kickstart Your Job Search

⚡ 13,559 remote jobs added this week

You're seeing 0.4% of available roles

Unlock 160,000+ jobs →

Meet JobCopilot: Your Personal Al Job Hunter

Automatically Apply to Remote Jobs

Try it now →

Before You Apply

️

🇺🇸	Be aware of the location restriction for this remote position: USA Only
‼	Beware of scams! When applying for jobs, you should NEVER have to pay anything. Learn more.

Back to Remote jobs > Devops > reliability engineer

Lead Site Reliability Engineer @athenahealth

Devops

Salary usd 119,000 - 2..	Remote Location 🇺🇸 USA Only
Employment Type full-time	Posted 1mth ago

Apply for this position

Unlock 160,000+ Remote Jobs

️

🇺🇸	Be aware of the location restriction for this remote position: USA Only
‼	Beware of scams! When applying for jobs, you should NEVER have to pay anything. Learn more.

Apply for this position

Unlock 160,000+ Remote Jobs

[Hiring] Lead Site Reliability Engineer @athenahealth

Apply to the best remote jobsbefore everyone else

Apply to the best remote jobs
before everyone else