Role Description
The Senior Director / VP of AI Production Reliability & Trust is accountable for two things:
-
Production reliability: ensuring HAP (our Hybrid AI Platform) and our customer deployments operate dependably under real traffic, real data, and real regulatory constraints.
-
Agent trust: building the frameworks β technical and operational β that allow enterprise customers to trust autonomous AI agents doing work on their behalf.
This is not a pre-release testing role. It is not a test automation role. It is not a QA team management role. This is a runtime governance role for non-deterministic, agentic AI systems. The systems you govern make decisions autonomously. The customers who depend on them are sovereign governments and large enterprises with no tolerance for unpredictable agent behavior.
What you will actually build:
-
A production quality operating system: quality gates, phase transition criteria, incident taxonomy, observability spec across our 6-layer Reference Architecture.
-
A continuous validation framework for agentic workflows β not test scripts run by humans, but autonomous evaluation pipelines that catch regression without human intervention.
-
An agent decision qualification framework: risk-tiered oversight for autonomous agent decisions, from ephemeral actions that need no review to high-stakes decisions that require multi-model consensus.
-
A trust evidence system: the observable signals β audit trails, behavioral consistency records, policy compliance evidence β that enterprise customers use to extend trust to agents operating on their behalf.
-
Production observability: instrumentation across Ingest, Prepare, Serve, Orchestrate, Monitor, and Optimize layers of the Reference Architecture.
-
A post-mortem and CAPA system: every production incident produces a root cause, a corrective action, and a new test that prevents recurrence.
Qualifications
-
10+ years in quality, reliability, or production operations for complex distributed systems β with at least some of that time governing AI or ML systems in live production.
-
Direct implementation experience with AI quality frameworks β you built it, not just led a team that built it.
-
Familiarity with the agentic AI quality problem: non-deterministic systems, hallucination detection, behavioral drift, autonomous decision governance.
-
Working knowledge of open-source evaluation and observability frameworks (LangSmith, Arize/Phoenix, RAGAS, PromptFlow, Weights & Biases, or similar) β not just commercial alternatives.
-
Background in regulated industries (financial services, telecom, healthcare, government) where AI quality failures have real contractual and commercial consequences.
-
Startup orientation: comfortable with ambiguity, iterative scope, and a team that moves faster than most people expect.
Requirements
-
Leaders who will hire a team first and direct them to build β we need someone who builds first and delegates second.
-
Candidates whose answer to "how would you do X?" is "I'd talk to my network" or "I'd evaluate vendors" β we need someone who already has answers.
-
Enterprise QA professionals whose toolkit is Selenium, Datadog, LoadRunner, or similar AI-washed commercial tools β we use open-source and next-gen frameworks and we expect you to know them.
-
Candidates whose production AI experience means "I oversaw a team monitoring an AI model" β we need someone who has implemented governance for autonomous agent systems.
-
People who need defined scope and predictable hours to do their best work.