Site Reliability Engineer II @110 Yahoo Holdings Inc.

[Hiring] Site Reliability Engineer II @110 Yahoo Holdings Inc.

Mar 26, 2025 - 110 Yahoo Holdings Inc. is hiring a remote Site Reliability Engineer II. 💸 Salary: $96,000.00 - $200,000.00/yr. 📍Location: USA.

This description is a summary of our understanding of the job description. Click on 'Apply' button to find out more.

Role Description

This role involves managing O11y, Incident & Oncall solutions ensuring high availability, reliability & scalability.

Support both Opensource and SaaS solutions that power Yahoo’s event response life cycle.
Focus on enhancing & automating workflows that empower DevOps teams across Yahoo.
Solve problems of various complexity both individually and in a team environment.

Key Responsibilities

Maintain & Improve comprehensive monitoring, alerting, and logging systems (e.g., OpenTSDB, Grafana, Splunk, Chronosphere, Big Panda, Rootly).
Enhance o11y guides & documentation to support ongoing service management operations.
Ensure 24/7/365 availability, scalability, and incident response for critical applications.
Participate in a global on-call rotation.
Troubleshoot, resolve, and document production issues, escalating when necessary.
Monitor and report performance, availability, and SLA metrics.
Work with development teams to enhance, document, and improve system operability.
Develop, configure, and manage Terraform-based Infrastructure as Code (IaC) configurations to automate provisioning, scaling, and management of cloud environments.
Build CICD pipelines and iterate on existing chef/ansible templates for application deployments used for OS builds, configurations, or upgrades.
Modernize infrastructure by performing OS upgrades & migrating services to Kubernetes.
Oversee Change management coordination with key stakeholders.
Develop and support automation scripts and tools for operational efficiency, leveraging AWS and GCP SDKs and APIs.
Provide stakeholders with progress updates on shared initiatives (Email, Jira, Slack, Tickets, GIT, Meetings).
Manage situations of moderate complexity and make timely decisions to ensure smooth operations.
Develop business operations workflows for large applications to meet business needs.

Qualifications

Bachelor’s or Master’s degree in Computer Science, Engineering, or 5+ years of experience in DevOps, Site Reliability Engineering (SRE), or Infrastructure Engineering roles.
2+ years of programming experience in Bash, Python, Java or Go.
In-depth knowledge of Linux distributions like RedHat and CentOS; Linux certifications (RHCT, RHCE, LPIC) are a plus.
Hands-on experience with AWS core services such as EC2, S3, RDS, EKS, Lambda, and networking services like VPC, Route 53, API GW, and Transit Gateway.
Understanding of containerization and orchestration technologies, especially Kubernetes.
Strong understanding of networking concepts (DNS, TCP/IP, HTTP/S, Load Balancing) and cloud-native networking in AWS.
Experience with CI/CD tools such as GitHub Actions, Jenkins, ArgoCD, Screwdriver.
An understanding of IaC concepts, specifically using Terraform.
Ability to troubleshoot & resolve hardware, network and software problems.
Experience with OSS and/or commercial observability tools like Grafana, NewRelic, DataDog, Splunk, Chronosphere, AWS or GCP native telemetry tools.
Strong skill set integrating diverse API and Web Services.
Strong troubleshooting skills with a focus on automation, scalability, and resilience.
Excellent communication and interpersonal skills.
Strong desire to learn new technologies and systems as part of daily work.

Preferred Job Qualifications

Knowledge and operational experience running large-scale global distributed systems.
Expert using Terraform as IaC.
Strong expertise in Splunk Cloud & Open Telemetry.
Experience managing multi-region, multi-AZ cloud deployments with a focus on disaster recovery and fault tolerance.
Proficient in Slack, Jira & Confluence.

Benefits

Flexible hybrid work options.
Comprehensive benefits including healthcare, a great 401k, backup childcare, education stipends, and more.

Similar Remote Jobs

Unlock 54,495 additional remote jobs, advanced search & email notifications

Too many emails? Declutter your inbox with Meco

Your home for reading newsletters

Before You Apply

️

📍	Be aware of the location restriction for this remote position: USA
‼	Beware of scams! When applying for jobs, you should NEVER have to pay anything. Learn more.

Site Reliability Engineer II @110 Yahoo Holdings Inc.

Devops / Sysadmin

Salary 💸 $96,000.00 - $200,000.00/yr	Remote Location USA
Job Type full-time	Posted Mar 26, 2025

️

📍	Be aware of the location restriction for this remote position: USA
‼	Beware of scams! When applying for jobs, you should NEVER have to pay anything. Learn more.