Back to Remote jobs  >   AI / ML
Staff Machine Learning Engineer, GenAI Platform @Reddit
AI / ML
Salary usd 253,300 - 3..
Remote Location
πŸ‡ΊπŸ‡Έ USA Only
Job Type full-time
Posted 4d ago

[Hiring] Staff Machine Learning Engineer, GenAI Platform @Reddit

4d ago - Reddit is hiring a remote Staff Machine Learning Engineer, GenAI Platform. πŸ’Έ Salary: usd 253,300 - 354,600 per year πŸ“Location: USA

Role Description

The Machine Learning Platform team at Reddit is a high-impact organization that owns the infrastructure powering recommendations, content discovery, and user quantification. As Generative AI becomes a strategic priority for Reddit, we are expanding our platform to meet the unique demands of foundation models.

As a Staff Software Engineer on the Machine Learning Platform team, you will be a key technical leader architecting and scaling our Generative AI and LLM platform capabilities. Training and deploying foundation models places unprecedented demands on our systems. You will define the technical strategy and build the core infrastructure that enables machine learning engineers and researchers to seamlessly train, evaluate, and iterate on large language models at Reddit scale.

  • Drive GenAI Infrastructure Strategy: Propose, design, and lead the architecture of our next-generation LLM platform, significantly advancing our capabilities to support large-scale foundation models that serve millions of redditors.
  • Design Resilient, Large-Scale Distributed Systems: Architect highly fault-tolerant training infrastructure capable of supporting multi-week, distributed workloads across massive GPU clusters.
  • Build Self-Serve LLM Workflows: Design and implement robust, production-grade pipelines for LLM fine-tuning (e.g., SFT, RLHF/DPO).
  • Develop Comprehensive Evaluation & Benchmarking Infrastructure: Treat model evaluation as a first-class platform capability.
  • Architect Advanced Data Ingestion Pipelines: Extend our distributed data platforms to natively and efficiently handle the massive, multimodal datasets (text, image, video) required for modern GenAI workloads.
  • Provide Technical Leadership & Mentorship: Analyze complex bottlenecks in distributed systems to optimize for performance and cost-efficiency.

Qualifications

  • 10+ years of work experience in a production software development environment or building complex distributed data systems.
  • Degree in ML, Engineering, Computer Science, or a related discipline.
  • Proven track record of designing and operating large-scale ML systems.
  • Hands-on experience managing fault-tolerant, petabyte-scale distributed systems.
  • Deep understanding of modern ML orchestration, fine-tuning pipelines, and model evaluation methodologies.
  • Hands-on practice with CUDA environments, GPU virtualization/containerization, and Kubernetes.
  • Experience with Kubernetes, Docker, and building production-quality, object-oriented code in Python and/or Go.
  • Strong organizational & communication skills.

Requirements

  • GenAI/LLM Infrastructure Expertise: Experience with distributed training frameworks (e.g., FSDP, DeepSpeed, Megatron-LM) and LLM serving/inference optimization (e.g., vLLM, TensorRT-LLM).
  • Distributed Systems Mastery: Experience with multi-node/multi-GPU training clusters.
  • Advanced MLOps Knowledge: Familiarity with tools like Ray, MLflow, or similar ecosystem standards.
  • Strong focus on scalability, reliability, performance, and ease of use.

Benefits

  • Comprehensive Healthcare Benefits and Income Replacement Programs
  • 401k with Employer Match
  • Global Benefit programs that fit your lifestyle, from workspace to professional development to caregiving support
  • Family Planning Support
  • Gender-Affirming Care
  • Mental Health & Coaching Benefits
  • Flexible Vacation & Paid Volunteer Time Off
  • Generous Paid Parental Leave
Before You Apply
️
πŸ‡ΊπŸ‡Έ Be aware of the location restriction for this remote position: USA Only
β€Ό Beware of scams! When applying for jobs, you should NEVER have to pay anything. Learn more.
Back to Remote jobs  >   AI / ML
Staff Machine Learning Engineer, GenAI Platform @Reddit
AI / ML
Salary usd 253,300 - 3..
Remote Location
πŸ‡ΊπŸ‡Έ USA Only
Job Type full-time
Posted 4d ago
Apply for this position
Did not apply βœ“
Applied βœ“
Sent Follow-Up βœ“
Interview Scheduled βœ“
Interview Completed βœ“
Offer Accepted βœ“
Offer Declined βœ“
Unlock 152,720 Remote Jobs
️
πŸ‡ΊπŸ‡Έ Be aware of the location restriction for this remote position: USA Only
β€Ό Beware of scams! When applying for jobs, you should NEVER have to pay anything. Learn more.
Apply for this position
Did not apply βœ“
Applied βœ“
Sent Follow-Up βœ“
Interview Scheduled βœ“
Interview Completed βœ“
Offer Accepted βœ“
Offer Declined βœ“
Unlock 152,720 Remote Jobs
Γ—

Apply to the best remote jobs
before everyone else

Access 152,720+ vetted remote jobs and get daily alerts.

4.9 β˜…β˜…β˜…β˜…β˜… from 500+ reviews
Unlock All Jobs Now

Maybe later