Senior II Site Reliability Engineer @Akamai
DevOps / Sysadmin
Salary unspecified
Remote Location
Job Type full-time
Posted 3d ago

[Hiring] Senior II Site Reliability Engineer @Akamai

3d ago - Akamai is hiring a remote Senior II Site Reliability Engineer. 💸 Salary: unspecified 📍Location: Poland

Role Description

Do you want to shape reliability practices for a new AI inference platform? Are you a senior technical leader who drives solutions across teams? Join the Akamai Inference Cloud Team.

The Akamai Inference Cloud team is part of Akamai's Cloud Technology Group. We design, implement, deploy and operate AI platforms that enable customers to run inference models and developers to create AI applications.

In this role, you'll lead reliability workstreams for Akamai's serverless inference platform, design SRE tooling and automation, and drive technical decisions. Opportunities exist to:

  • Mentor other SREs
  • Influence architecture decisions with product engineering teams
  • Shape SRE practices for AI inference workloads and GPU infrastructure at scale

As a Senior II SRE, you will be responsible for:

  • Taking responsibility for observability strategy, designing telemetry, dashboards, alerts, defining SLO/SLI frameworks, and implementing improvements when targets are missed
  • Building production-grade automation and tooling that reduces operational toil, improves incident response, and sets patterns that other SREs adopt
  • Owning incident management integration for inference workloads, designing frameworks, leading incident response during on-call rotations, and driving systemic improvements from post-mortems
  • Defining and implementing deployment safety practices including progressive rollouts, canary analysis, and rollback automation, establishing standards for the team
  • Partnering with product engineering teams to influence architecture decisions, ensure operational readiness, and represent the SRE perspective in design reviews
  • Mentoring Senior and mid-level SREs through code reviews, design discussions, and hands-on problem-solving

Qualifications

  • Extensive experience in SRE, platform engineering, or infrastructure engineering, working with large-scale distributed systems
  • Track record of defining SLO/SLI frameworks, building observability platforms, and running incident management processes at scale
  • Demonstrated expertise in Kubernetes and containerization, including autoscaling, resource scheduling, and orchestration for compute-intensive workloads at scale
  • Experience building automation and tooling using Python or Go, while leveraging CI/CD pipelines, deployment safety practices, and infrastructure-as-code expertise
  • Ability to lead technical initiatives across teams, guide engineers through mentorship, and resolve complex reliability challenges independently with expertise and precision
  • Experience in AI/ML infrastructure, model deployment, or handling GPU workloads effectively within relevant environments
  • Demonstrated ownership of intricate reliability issues, delivering solutions collaboratively, and enhancing the technical expertise of surrounding SRE team members

Benefits

  • Opportunities to grow, flourish, and achieve great things
  • Benefits surrounding all aspects of your life, including:
    • Your health
    • Your finances
    • Your family
    • Your time at work
    • Your time pursuing other endeavors
  • Benefit plan options designed to meet individual needs and budget, both today and in the future

Company Description

Akamai powers and protects life online. Leading companies worldwide choose Akamai to build, deliver, and secure their digital experiences, helping billions of people live, work, and play every day. With the world's most distributed compute platform—from cloud to edge—we make it easy for customers to develop and run applications, while we keep experiences closer to users and threats farther away.

Before You Apply
remote Be aware of the location restriction for this remote position: Poland
Beware of scams! When applying for jobs, you should NEVER have to pay anything. Learn more.
Senior II Site Reliability Engineer @Akamai
DevOps / Sysadmin
Salary unspecified
Remote Location
Job Type full-time
Posted 3d ago
Apply for this position
Did not apply
Applied
Sent Follow-Up
Interview Scheduled
Interview Completed
Offer Accepted
Offer Declined
Unlock 152,720 Remote Jobs
remote Be aware of the location restriction for this remote position: Poland
Beware of scams! When applying for jobs, you should NEVER have to pay anything. Learn more.
Apply for this position
Did not apply
Applied
Sent Follow-Up
Interview Scheduled
Interview Completed
Offer Accepted
Offer Declined
Unlock 152,720 Remote Jobs
×

Apply to the best remote jobs
before everyone else

Access 152,720+ vetted remote jobs and get daily alerts.

4.9 ★★★★★ from 500+ reviews
Unlock All Jobs Now

Maybe later