Role Description
We are seeking a DevOps/SRE Team Lead with proven, hands-on Kubernetes expertise to drive the reliability and scalability of our video processing infrastructure and oversee a small team of SREs and DevOps Engineers. This is a deeply technical lead role, requiring real-world experience administering production Kubernetes clusters—not theoretical familiarity. You will own CI/CD pipelines, infrastructure automation, and cloud platform operations in a fully remote environment where independent execution is essential.
You will spend 70-80% of your day being hands-on in the following areas:
-
Design, deploy, and administer production Kubernetes clusters, including workload scheduling, namespace management, RBAC, network policies, and cluster upgrades.
-
Design and maintain continuous integration/deployment pipelines to automate testing and deployment, including Kubernetes-native delivery workflows using Helm and ArgoCD or equivalent.
-
Track software performance, fixing errors, troubleshooting systems, implement preventative measures to ensure smooth workflows.
-
Implement and manage infrastructure.
-
Utilize Terraform or CloudFormation for IaC management.
-
Optimize cloud resources by implementing cost-effective solutions.
-
Collaborate with various teams to ensure smooth deployment.
-
Monitor and create new processes based on performance analysis.
-
Implement security best practices, including automated compliance checks and secure code deployment.
You will spend 20-30% of your time managing the following areas:
-
Manage the technical roadmap, architecture while mentoring SRE and DevOps Engineers (Player/Coach).
-
Hire, coach, and manage a team of DevOps engineers and Site Reliability Engineers.
-
Strong communication, conflict resolution, and the ability to influence without authority.
-
Define DevOps/Platform roadmap aligned with business goals (e.g., cloud cost optimization, automation maturity).
-
Excellent communication and collaboration skills.
Qualifications
-
Bachelor’s degree in computer science, Engineering or equivalent.
-
5-8+ years of experience in DevOps/SRE, with 2-3+ years in a leadership role.
-
Hands-on experience building and maintaining CI/CD pipelines (GitHub Actions, GitLab CI, Jenkins, or equivalent) with direct integration into Kubernetes deployment workflows.
-
Production-level experience with infrastructure as code (Terraform required; CloudFormation or Pulumi a plus), including managing cloud-hosted Kubernetes clusters (EKS, GKE, or AKS).
-
Experience with monitoring, logging, and observability tooling in Kubernetes environments (Prometheus, Grafana, Datadog, ELK/EFK stack, or equivalent); ability to build dashboards and alerts from scratch, not just consume existing ones.
-
Demonstrated, hands-on Kubernetes experience in production environments: cluster administration, Helm chart authoring and management, RBAC configuration, persistent storage, horizontal/vertical pod autoscaling, and diagnosing and resolving real production failures (CrashLoopBackOff, OOMKilled, networking issues, etc.).
-
Strong troubleshooting skills with the ability to diagnose infrastructure and application issues live, under pressure, without reference materials—this is evaluated directly in our interview process.
-
Proficiency in scripting languages (Python, Go, Bash, or PowerShell); ability to write and own automation scripts, not just modify existing ones.
Benefits
-
Day-one medical, dental & vision coverage.
-
100% company-paid life + disability insurance.
-
401(k) with a sweet company match (up to 8%).
-
Quarterly HSA boosts & flexible spending accounts.
-
Flexible time off (salaried) or PTO (hourly) + generous paid holidays.
-
Pet insurance (yes, your dog gets benefits too).
-
Legal plan + extras like accident & critical illness coverage.