Staff Research Engineer, Post-training & Evaluation @Reddit
Artificial Intelligence
Salary usd 230,000 - 3..
Remote Location
πŸ‡ΊπŸ‡Έ USA Only
Employment Type full-time
Posted 2wks ago

[Hiring] Staff Research Engineer, Post-training & Evaluation @Reddit

2wks ago - Reddit is hiring a remote Staff Research Engineer, Post-training & Evaluation. πŸ’Έ Salary: usd 230,000 - 322,000 per year πŸ“Location: USA

Role Description

The AI Engineering team at Reddit is building our own Reddit-native foundational Large Language Models (LLMs). This team sits at the intersection of applied research and massive-scale infrastructure, training models that truly understand the unique culture, language, and structure of Reddit communities. You'll join a team of distinguished engineers and researchers building the "engine room" of Reddit's AI future β€” the foundational models that power Safety & Moderation, Search, Ads, and the next generation of consumer products.

As a Staff Research Engineer for Post-Training & Evaluation Science, you will own the science of our model development "feedback loop." While pre-training builds the base models, you define how we measure whether those models are safe, smart, and "Reddit-native," and you set the post-training methodology that turns base checkpoints into high-performing endpoints. You will define the Reddit Benchmark β€” our internal standard for rigorous model quality across both generation and representation β€” and own the evaluation science that the rest of the org's iteration depends on.

Responsibilities

  • Define the "Reddit Benchmark" evaluation standard:
    • Own the methodology for rigorously measuring model quality across Safety, Reasoning, representation/retrieval, and Reddit-specific knowledge.
    • Decide what "Reddit-native" means in measurable terms and set the bar the org trains against.
  • Own evaluation reliability and statistical rigor:
    • Establish the science behind trustworthy evals β€” judge variance, multi-sample scoring, inter-rater/inter-sample agreement, sampling and temperature effects, and calibration of automated judges.
    • Drive the practice of evaluation as a release gate β€” offline against frozen datasets, and pre-merge in CI/CD.
  • Design model-as-a-judge methodology:
    • Own judge selection, prompt design, calibration, and reliability for automated evaluation using frontier external models.
  • Set post-training recipes and strategy:
    • Design SFT recipes (data mixtures, curriculum, ablation strategy) that convert base models into helpful, well-aligned endpoints.
    • Partner with engineering to scale them.
  • Evaluate base and CPT checkpoints, not just endpoints:
    • Design checkpoint-selection methodology across CPT experiments and LR studies.
  • Drive synthetic data generation strategy:
    • Define and curate high-quality instruction and evaluation sets to improve generalization where human data is scarce.
  • Partner with Safety Engineering:
    • Translate high-level safety policy into concrete classification metrics, probe sets, and CI/CD unit tests.
  • Diagnose post-training instability:
    • Dive into loss curves and eval logs to identify alignment tax and capability degradation, and recommend the fix.
  • Lead research direction:
    • Set technical direction for evaluation and post-training across the team, mentor engineers and scientists, and represent the work internally (and externally where appropriate).

Qualifications

  • 6+ years of professional ML experience (or PhD + 4+) with a direct focus on LLM post-training and evaluation.
  • PhD or MS in CS, ML, NLP, IR, or a related quantitative field β€” or equivalent industry research experience.
  • Deep expertise in evaluation reliability: judge/sample variance, multi-sample scoring, calibration, statistical significance, and the failure modes of automated evaluation.
  • Strong experience building custom, domain-specific evaluation harnesses.
  • Experience evaluating both generation and representation/classification.
  • Deep understanding of Continuous Pre-training (CPT), Instruction Tuning (SFT), and how data quality shapes model behavior.
  • Fluency in Python; strong data-pipeline and eval-harness engineering.

Requirements

  • Experience with MLflow or similar experiment-tracking frameworks.
  • Familiarity with modern fine-tuning frameworks and PyTorch-native training stacks.
  • Synthetic data generation techniques.
  • Experience with preference optimization.
  • Publications in NLP/ML/FAccT or related venues, or other evidence of research leadership.
  • Experience evaluating multimodal models.

Benefits

  • Comprehensive Healthcare Benefits and Income Replacement Programs
  • 401k with Employer Match
  • Global Benefit programs that fit your lifestyle, from workspace to professional development to caregiving support
  • Family Planning Support
  • Gender-Affirming Care
  • Mental Health & Coaching Benefits
  • Flexible Vacation & Paid Volunteer Time Off
  • Generous Paid Parental Leave
Before You Apply
️
πŸ‡ΊπŸ‡Έ Be aware of the location restriction for this remote position: USA Only
β€Ό Beware of scams! When applying for jobs, you should NEVER have to pay anything. Learn more.
Staff Research Engineer, Post-training & Evaluation @Reddit
Artificial Intelligence
Salary usd 230,000 - 3..
Remote Location
πŸ‡ΊπŸ‡Έ USA Only
Employment Type full-time
Posted 2wks ago
Apply for this position
Did not apply βœ“
Applied βœ“
Sent Follow-Up βœ“
Interview Scheduled βœ“
Interview Completed βœ“
Offer Accepted βœ“
Offer Declined βœ“
Application Denied βœ“
Unlock 145,000+ Remote Jobs
️
πŸ‡ΊπŸ‡Έ Be aware of the location restriction for this remote position: USA Only
β€Ό Beware of scams! When applying for jobs, you should NEVER have to pay anything. Learn more.
Apply for this position
Did not apply βœ“
Applied βœ“
Sent Follow-Up βœ“
Interview Scheduled βœ“
Interview Completed βœ“
Offer Accepted βœ“
Offer Declined βœ“
Application Denied βœ“
Unlock 145,000+ Remote Jobs
Γ—

Apply to the best remote jobs
before everyone else

Access 145,000+ vetted remote jobs and get daily alerts.

4.9 β˜…β˜…β˜…β˜…β˜… from 500+ reviews
Unlock All Jobs Now

Maybe later