Get daily remote job opportunities in your inbox

No middlemen, no spam, no infinite scrolling.

Get relevant job opportunities, one email at a time.

Unsubscribe at any time.

Site Reliability Engineer II @110 Yahoo Holdings Inc.

[Hiring] Site Reliability Engineer II @110 Yahoo Holdings Inc.

Mar 26, 2025 - 110 Yahoo Holdings Inc. is hiring a remote Site Reliability Engineer II. 💸 Salary: $96,000.00 - $200,000.00/yr. 📍Location: USA.

This description is a summary of our understanding of the job description. Click on 'Apply' button to find out more.

Role Description

This role involves managing O11y, Incident & Oncall solutions ensuring high availability, reliability & scalability.

  • Support both Opensource and SaaS solutions that power Yahoo’s event response life cycle.
  • Focus on enhancing & automating workflows that empower DevOps teams across Yahoo.
  • Solve problems of various complexity both individually and in a team environment.

Key Responsibilities

  • Maintain & Improve comprehensive monitoring, alerting, and logging systems (e.g., OpenTSDB, Grafana, Splunk, Chronosphere, Big Panda, Rootly).
  • Enhance o11y guides & documentation to support ongoing service management operations.
  • Ensure 24/7/365 availability, scalability, and incident response for critical applications.
  • Participate in a global on-call rotation.
  • Troubleshoot, resolve, and document production issues, escalating when necessary.
  • Monitor and report performance, availability, and SLA metrics.
  • Work with development teams to enhance, document, and improve system operability.
  • Develop, configure, and manage Terraform-based Infrastructure as Code (IaC) configurations to automate provisioning, scaling, and management of cloud environments.
  • Build CICD pipelines and iterate on existing chef/ansible templates for application deployments used for OS builds, configurations, or upgrades.
  • Modernize infrastructure by performing OS upgrades & migrating services to Kubernetes.
  • Oversee Change management coordination with key stakeholders.
  • Develop and support automation scripts and tools for operational efficiency, leveraging AWS and GCP SDKs and APIs.
  • Provide stakeholders with progress updates on shared initiatives (Email, Jira, Slack, Tickets, GIT, Meetings).
  • Manage situations of moderate complexity and make timely decisions to ensure smooth operations.
  • Develop business operations workflows for large applications to meet business needs.

Qualifications

  • Bachelor’s or Master’s degree in Computer Science, Engineering, or 5+ years of experience in DevOps, Site Reliability Engineering (SRE), or Infrastructure Engineering roles.
  • 2+ years of programming experience in Bash, Python, Java or Go.
  • In-depth knowledge of Linux distributions like RedHat and CentOS; Linux certifications (RHCT, RHCE, LPIC) are a plus.
  • Hands-on experience with AWS core services such as EC2, S3, RDS, EKS, Lambda, and networking services like VPC, Route 53, API GW, and Transit Gateway.
  • Understanding of containerization and orchestration technologies, especially Kubernetes.
  • Strong understanding of networking concepts (DNS, TCP/IP, HTTP/S, Load Balancing) and cloud-native networking in AWS.
  • Experience with CI/CD tools such as GitHub Actions, Jenkins, ArgoCD, Screwdriver.
  • An understanding of IaC concepts, specifically using Terraform.
  • Ability to troubleshoot & resolve hardware, network and software problems.
  • Experience with OSS and/or commercial observability tools like Grafana, NewRelic, DataDog, Splunk, Chronosphere, AWS or GCP native telemetry tools.
  • Strong skill set integrating diverse API and Web Services.
  • Strong troubleshooting skills with a focus on automation, scalability, and resilience.
  • Excellent communication and interpersonal skills.
  • Strong desire to learn new technologies and systems as part of daily work.

Preferred Job Qualifications

  • Knowledge and operational experience running large-scale global distributed systems.
  • Expert using Terraform as IaC.
  • Strong expertise in Splunk Cloud & Open Telemetry.
  • Experience managing multi-region, multi-AZ cloud deployments with a focus on disaster recovery and fault tolerance.
  • Proficient in Slack, Jira & Confluence.

Benefits

  • Flexible hybrid work options.
  • Comprehensive benefits including healthcare, a great 401k, backup childcare, education stipends, and more.

Similar Remote Jobs

More jobs at 110 Yahoo Holdings Inc.

More Devops / Sysadmin jobs

More jobs in USA

Before You Apply
📍 Be aware of the location restriction for this remote position: USA
Beware of scams! When applying for jobs, you should NEVER have to pay anything. Learn more.
Site Reliability Engineer II @110 Yahoo Holdings Inc.
Devops / Sysadmin
Salary 💸 $96,000.00 - $200,000.00/yr
Remote Location
USA
Job Type full-time
Posted Mar 26, 2025
Apply for this position Unlock 54,495 Remote Jobs
📍 Be aware of the location restriction for this remote position: USA
Beware of scams! When applying for jobs, you should NEVER have to pay anything. Learn more.
Site Reliability Engineer II Apply for this position Unlock 54,495 Remote Jobs
×
  • Unlock 54,495 hidden remote jobs.
  • Your shortcut to remote work. Apply before everyone else.
  • Click and apply. No middlemen, no hassle.

We’re not like the other sites. Come see why!

50% off in March 2025
  • Single payment
  • Lifetime access
  • Filter by location/skills/salary…
  • Create custom email alerts
  • Private Slack Community