Role Description
Own and evolve Launch Potato's cloud infrastructure, CI/CD platform, and compliance posture. Build the SRE function from the ground up so product teams can ship faster without compromising reliability, security, or cost control.
Outcomes
-
Stand up the SRE practice from scratch: on-call rotation, PagerDuty configuration, SLA/SLO definitions for core infrastructure services, runbook library, and observability dashboards that tie site performance to business metrics.
-
Complete the AWS multi-account migration: move production workloads to an isolated account with zero unplanned downtime.
-
Deliver SOC 2 Type I audit-ready infrastructure evidence package: own the technical controls implementation end-to-end.
-
Version and publish the Terraform module library: (30+ modules) to a private registry to eliminate ad hoc git consumption by product teams.
-
Implement automated deployment rollback for ECS and Lambda: gate production on integration test passage.
-
Stand up monthly cost reporting to leadership: budget anomaly detection, savings plan recommendations, spend by service/team/environment.
Qualifications
-
5+ years of production AWS infrastructure experience with deep Terraform expertise.
-
Hands-on experience building the SRE function from scratch and had complete ownership.
-
Experience with a multi-site company where PaaS or microservices are required.
-
CI/CD pipeline ownership in one or more previous roles.
-
PagerDuty experience and standing up an on-call rotation.
-
5+ years hands-on with AWS, Terraform, CI/CD pipeline ownership, and SRE tooling (OpenTelemetry, Grafana, PagerDuty or equivalent) in a production environment.
Requirements
-
Ownership orientation: You don't wait to be assigned a problem. If something is broken, undocumented, or a risk, you flag it and fix it. If the runbooks don't exist yet, you write them.
-
Documentation discipline: You write things down. Runbooks, decision rationale, architecture patterns, incident post-mortems. The next person should be able to understand your work without asking you.
-
Cost consciousness: You think about the business impact of infrastructure decisions. You can explain a spending anomaly to a CFO in plain language. You know what things cost before you build them.
-
Calm under pressure: Production incidents happen. You triage clearly, communicate proactively with technical and non-technical stakeholders, and run a tight post-mortem without blame. You've been woken up at 3am. You can handle it.
-
Cross-functional communication: You can work with product engineers, legal/compliance, and executive leadership in the same week without switching communication modes awkwardly. You speak both engineer and business.
-
Proactive reliability: A good SRE reacts to outages. A great SRE catches degradation before it becomes an outage. You build alerting against the patterns, not just the failures.
Benefits
-
Base salary: $160,000 to $190,000 per year, paid semi-monthly.
-
Total compensation includes a base salary, profit-sharing bonus, and competitive benefits.
-
Performance-driven company: future increases will be based on company and personal performance, not annual cost of living adjustments.