Role Description
At Capital One, we are creating responsible and reliable AI systems, changing banking for good. We are seeking a Senior Distinguished Engineer, a hands-on technical leader passionate about distributed systems, to engineer and scale foundational compute capabilities for our platform.
-
Architect and build control and data plane implementations required to realize a highly available, multi-tenant, large scale and a secure machine learning platform.
-
Develop Ray and Spark distributed compute engine solutions to accelerate diverse workloads from LLM pre-training and reinforcement learning to large-scale data processing, while maximizing compute unit economics.
-
Engineer systemic improvements for operational excellence including automating KTLO (Keep The Lights On) workflows.
-
Direct the technical execution of a diverse project portfolio, collaborating with developers specializing in everything ranging from distributed microservices to running large foundation models.
-
Work cross-functionally with product and program management disciplines, and stakeholder and partners across Capital One to help optimize business outcomes while driving towards strong technology solutions.
-
Share your passion for staying on top of tech trends, experimenting with and learning new technologies, participating in internal & external technology communities, and leading system design and code review sessions.
-
Help elevate the Capital One Distinguished Engineering community and establish yourself as a go-to resource on given technologies and technology-enabled capabilities.
-
Lead the way in creating next-generation talent, mentoring internal talent and actively recruiting external talent to bolster the Capital One tech talent pool.
Qualifications
-
Bachelorโs Degree.
-
At least 7 years of experience with application architecture and design patterns.
-
At least 5 years of experience with distributed databases, microservice architectures, and high availability systems.
Requirements
-
Degree in Computer Science or a Masterโs Degree in Software Engineering.
-
Hands-on experience in the internals of Ray (Actors/GCS/Scheduling) or Spark (Query Optimizer/Memory Management).
-
Experience building platforms that support LLM training, fine-tuning, or high-throughput inference.
-
Hands-on experience with AWS-specific compute primitives (EKS, EC2 UltraClusters, Graviton) and cost-optimization strategies.
Benefits
-
Comprehensive, competitive, and inclusive set of health, financial and other benefits that support your total well-being.
-
Performance-based incentive compensation, which may include cash bonus(es) and/or long term incentives (LTI).