[Hiring] Sr. Program Manager, Incident Management @Zapier
Sr. Program Manager, Incident Management @Zapier
Project Management
Salary usd 174,200 - 2..
Employment Type full-time
Posted 2wks ago

[Hiring] Sr. Program Manager, Incident Management @Zapier

2wks ago - Zapier is hiring a remote Sr. Program Manager, Incident Management. πŸ’Έ Salary: usd 174,200 - 261,300 per year πŸ“Location: USA, Canada, Brazil, Colombia, Argentina, Chile, Venezuela, Mexico, Peru, Uruguay

Role Description

As Zapier expands into the enterprise market, operational rigor matters more than ever. The Sr. Program Manager will own the end-to-end incident management program for Zapier's Product and Engineering organization: response, post-incident learning and actions, and everything in between. You'll report to the Director of Engineering for Internal Platforms & Infrastructure and be the DRI for the program's design, execution, and outcomes. You build the program and leverage AI to scale its impact.

About You

  • You have deep incident management experience and you've moved beyond just executing it.
  • You've built and led incident response programs, post-incident processes, SRE practices, or reliability-focused work.
  • You know incident management deeply enough to rethink it, not just replicate it.
  • You've ideally done 0-to-1 work in this space: stood up programs, defined standards, trained responders.
  • You re-engineer how work happens based on where AI is headed.
  • You've created repeatable systems (workflows, agents, copilots, or automation) that fundamentally changed how work gets done.
  • You use AI-native tools (Cursor, Claude Code, or similar) as your default, and orchestrate them into durable capabilities that compound over time.
  • You have a forward-looking thesis on how AI will reshape your domain and you've already acted on it.
  • You can quantify the impact on velocity, quality, or organizational capacity.
  • You iterate, refine, and critically evaluate AI outputs, embedding quality standards and accountability into the systems you build.
  • You're a builder, not a specialist.
  • You have deep expertise in incident management, but you're not rigidly attached to how you've done it before.
  • You can stretch into adjacent areas (reliability strategy, enterprise readiness, operational tooling) as the role evolves.
  • You build durable systems that work without you.
  • You bring an upstream, systems mindset.
  • You instinctively look for root causes and design solutions that scale beyond your immediate program.
  • You understand how the full incident lifecycle (prevention, detection, response, learning) supports customer trust and enterprise readiness.
  • You influence without authority.
  • You shape outcomes by building trust.
  • You know how to build coalitions across engineering, support, security, GTM, and leadership.
  • You lead change and not just implement it.
  • You anticipate resistance, adapt your approach, and help others adopt new ways of working.
  • You have technical empathy.
  • You can go toe-to-toe with engineers, support leads, and product leaders to clarify the "why" behind technical tradeoffs and incident decisions.
  • You understand the role of observability (logs, metrics, traces), SLOs, and thresholds in incident response and prevention.
  • You bias for velocity and clarity.
  • You act decisively even in high ambiguity.
  • You communicate with relentless clarity: context and intent early, often, and candidly especially when it's uncomfortable.
  • You're analytical and hands-on with data.
  • You can work directly with data tools (e.g., Databricks, SQL) to build rich reporting and meaningful insights.
  • You understand incident tooling (incident.io or similar) and how it integrates with Slack, PagerDuty, and on-call workflows.
  • You work well remotely.

Things You'll Do

  • Own the incident program.
  • Lead the design, evolution, and governance of incident processes across the Build organization.
  • Ensure workflows are consistent, auditable, and aligned with enterprise expectations.
  • Build AI-powered incident systems.
  • Design and ship repeatable AI tools: automated incident summarization, intelligent severity classification, AI-assisted root cause analysis, postmortem draft generation, and more.
  • Accelerate decisions.
  • Create clarity in ambiguity, align stakeholders, and drive decisions across teams and zones.
  • Surface and resolve systemic issues.
  • Identify recurring org friction, drive root-cause solutions, and implement fixes that persist beyond individual incidents.
  • Build and maintain reporting.
  • Build, maintain, and refine dashboards and reports using Databricks, Looker, and related tools.
  • Raise the bar.
  • Instill rigor and accountability.
  • Coach responders and incident roles (Incident Commander, Support Leads, and new roles as they emerge).
  • Produce and maintain clear documentation (playbooks, templates, guides) and deliver training for all incident roles and stakeholder groups.
  • Partner cross-functionally.
  • Collaborate with engineering leads, EMs, product, support, security, GTM, and leadership to strengthen practices.
  • Step in when needed.
  • Step into incident response roles during business hours as appropriate to experience the work firsthand and inform program improvements.

Our Stack & Tools

  • Incident tooling: incident.io, PagerDuty, Slack, Zendesk
  • Data & Reporting: Databricks, Grafana, Looker
  • Observability context: Datadog, Grafana, Prometheus, Opensearch
  • Infra context: AWS, Kubernetes, Terraform (with SRE/Platform partners)
  • Collaboration: GitLab, Coda, Google Workspace

What Success Looks Like

  • The incident program is dependable and normalized.
  • Internal teams feel supported.
  • Workflows run consistently with low friction.
  • Systemic improvements persist.
  • Data quality is rich and trusted.
  • Outcomes improve measurably.
  • You're a force multiplier.

Application Deadline

The anticipated application window is 30 days from the date job is posted, unless the number of applicants requires it to close sooner or later, or if the position is filled.

Before You Apply
️
remote Be aware of the location restriction for this remote position: USA, Canada, Brazil, Colombia, Argentina, Chile, Venezuela, Mexico, Peru, Uruguay
β€Ό Beware of scams! When applying for jobs, you should NEVER have to pay anything. Learn more.
Sr. Program Manager, Incident Management @Zapier
Project Management
Salary usd 174,200 - 2..
Employment Type full-time
Posted 2wks ago
Apply for this position
Did not apply βœ“
Applied βœ“
Sent Follow-Up βœ“
Interview Scheduled βœ“
Interview Completed βœ“
Offer Accepted βœ“
Offer Declined βœ“
Unlock 145,000+ Remote Jobs
️
remote Be aware of the location restriction for this remote position: USA, Canada, Brazil, Colombia, Argentina, Chile, Venezuela, Mexico, Peru, Uruguay
β€Ό Beware of scams! When applying for jobs, you should NEVER have to pay anything. Learn more.
Apply for this position
Did not apply βœ“
Applied βœ“
Sent Follow-Up βœ“
Interview Scheduled βœ“
Interview Completed βœ“
Offer Accepted βœ“
Offer Declined βœ“
Unlock 145,000+ Remote Jobs
Γ—

Apply to the best remote jobs
before everyone else

Access 145,000+ vetted remote jobs and get daily alerts.

4.9 β˜…β˜…β˜…β˜…β˜… from 500+ reviews
Unlock All Jobs Now

Maybe later