Senior Principal Site Reliability Engineer @Akamai
DevOps / Sysadmin
Salary unspecified
Remote Location
Job Type full-time
Posted 1wk ago

[Hiring] Senior Principal Site Reliability Engineer @Akamai

1wk ago - Akamai is hiring a remote Senior Principal Site Reliability Engineer. πŸ’Έ Salary: unspecified πŸ“Location: Poland

Role Description

Ready to define the reliability architecture for AI products, from GPU compute to globally distributed inference, ensuring performance and reliability at scale.

As Senior Principal SRE for AI, this role involves setting technical direction for building, operating, and scaling AI services. Responsibilities include:

  • Writing code, designing systems, and solving complex reliability issues.
  • Mentoring team members, defining technical standards, and promoting engineering best practices.
  • Achieving influence with product engineering teams through exceptional technical expertise.

As a Principal Site Reliability Engineer, you will be responsible for:

  • Defining the reliability architecture for Akamai's AI compute and platform services, including SLO frameworks, fault tolerance patterns, and capacity planning models.
  • Hands-on building of automation and tooling that reduces operational toil and scales the SRE team's impact.
  • Designing observability strategy by leveraging Akamai's existing platform to build the telemetry, dashboards, alerts, and GPU-specific monitoring needed for AI workloads.
  • Architecting deployment safety practices including progressive rollouts, canary analysis, rollback automation, and change safety processes.
  • Influencing product engineering architecture and design decisions, embedding reliability into the development lifecycle at the system level.
  • Mentoring and elevating other SREs through design reviews, code reviews, and hands-on problem-solving, setting the technical bar for the team.

Qualifications

  • Extensive experience in SRE, platform engineering, and/or infrastructure engineering, with demonstrated impact at a principal or staff level.
  • Extensive Kubernetes expertise, managing autoscaling, resource scheduling, and container orchestration for handling compute-intensive workloads effectively.
  • Programming expertise in Python or Go, focusing on creating automation and tooling for production-grade environments.
  • Expertise in programming with Python and/or Go, coupled with experience creating production-grade automation, tooling, and platform services.
  • Experience in AI/ML infrastructure, model deployment, or GPU workloads to enhance technical expertise and practical understanding.

Requirements

  • Influence cross-team technical decisions, mentor engineers, elevate technical standards, and collaborate effectively with product engineering teams.
  • Design reliability into innovative platforms at the system level while building influence with product engineering teams through technical expertise.

Benefits

  • FlexBase, Akamai's Global Flexible Working Program, offers 95% of employees the choice to work from home, the office, or both.
  • Opportunities to grow, flourish, and achieve great things.
  • Benefit options designed to meet individual needs for health, finances, family, work-life balance, and personal pursuits.
Before You Apply
️
remote Be aware of the location restriction for this remote position: Poland
β€Ό Beware of scams! When applying for jobs, you should NEVER have to pay anything. Learn more.
Senior Principal Site Reliability Engineer @Akamai
DevOps / Sysadmin
Salary unspecified
Remote Location
Job Type full-time
Posted 1wk ago
Apply for this position
Did not apply βœ“
Applied βœ“
Sent Follow-Up βœ“
Interview Scheduled βœ“
Interview Completed βœ“
Offer Accepted βœ“
Offer Declined βœ“
Unlock 152,720 Remote Jobs
️
remote Be aware of the location restriction for this remote position: Poland
β€Ό Beware of scams! When applying for jobs, you should NEVER have to pay anything. Learn more.
Apply for this position
Did not apply βœ“
Applied βœ“
Sent Follow-Up βœ“
Interview Scheduled βœ“
Interview Completed βœ“
Offer Accepted βœ“
Offer Declined βœ“
Unlock 152,720 Remote Jobs
Γ—

Apply to the best remote jobs
before everyone else

Access 152,720+ vetted remote jobs and get daily alerts.

4.9 β˜…β˜…β˜…β˜…β˜… from 500+ reviews
Unlock All Jobs Now

Maybe later