Back to Remote jobs > Artificial Intelligence > research engineer

Staff Research Engineer, Post-training & Evaluation @Reddit

Artificial Intelligence

Salary usd 230,000 - 3..	Remote Location 🇺🇸 USA Only
Employment Type full-time	Posted 2wks ago

[Hiring] Staff Research Engineer, Post-training & Evaluation @Reddit

2wks ago - Reddit is hiring a remote Staff Research Engineer, Post-training & Evaluation. 💸 Salary: usd 230,000 - 322,000 per year 📍Location: USA

Role Description

The AI Engineering team at Reddit is building our own Reddit-native foundational Large Language Models (LLMs). This team sits at the intersection of applied research and massive-scale infrastructure, training models that truly understand the unique culture, language, and structure of Reddit communities. You'll join a team of distinguished engineers and researchers building the "engine room" of Reddit's AI future — the foundational models that power Safety & Moderation, Search, Ads, and the next generation of consumer products.

As a Staff Research Engineer for Post-Training & Evaluation Science, you will own the science of our model development "feedback loop." While pre-training builds the base models, you define how we measure whether those models are safe, smart, and "Reddit-native," and you set the post-training methodology that turns base checkpoints into high-performing endpoints. You will define the Reddit Benchmark — our internal standard for rigorous model quality across both generation and representation — and own the evaluation science that the rest of the org's iteration depends on.

Responsibilities

Define the "Reddit Benchmark" evaluation standard:

Own the methodology for rigorously measuring model quality across Safety, Reasoning, representation/retrieval, and Reddit-specific knowledge.
Decide what "Reddit-native" means in measurable terms and set the bar the org trains against.

Own evaluation reliability and statistical rigor:

Establish the science behind trustworthy evals — judge variance, multi-sample scoring, inter-rater/inter-sample agreement, sampling and temperature effects, and calibration of automated judges.
Drive the practice of evaluation as a release gate — offline against frozen datasets, and pre-merge in CI/CD.

Design model-as-a-judge methodology:

Own judge selection, prompt design, calibration, and reliability for automated evaluation using frontier external models.

Set post-training recipes and strategy:

Design SFT recipes (data mixtures, curriculum, ablation strategy) that convert base models into helpful, well-aligned endpoints.
Partner with engineering to scale them.

Evaluate base and CPT checkpoints, not just endpoints:

Design checkpoint-selection methodology across CPT experiments and LR studies.

Drive synthetic data generation strategy:

Define and curate high-quality instruction and evaluation sets to improve generalization where human data is scarce.

Partner with Safety Engineering:

Translate high-level safety policy into concrete classification metrics, probe sets, and CI/CD unit tests.

Diagnose post-training instability:

Dive into loss curves and eval logs to identify alignment tax and capability degradation, and recommend the fix.

Lead research direction:

Set technical direction for evaluation and post-training across the team, mentor engineers and scientists, and represent the work internally (and externally where appropriate).

Qualifications

6+ years of professional ML experience (or PhD + 4+) with a direct focus on LLM post-training and evaluation.
PhD or MS in CS, ML, NLP, IR, or a related quantitative field — or equivalent industry research experience.
Deep expertise in evaluation reliability: judge/sample variance, multi-sample scoring, calibration, statistical significance, and the failure modes of automated evaluation.
Strong experience building custom, domain-specific evaluation harnesses.
Experience evaluating both generation and representation/classification.
Deep understanding of Continuous Pre-training (CPT), Instruction Tuning (SFT), and how data quality shapes model behavior.
Fluency in Python; strong data-pipeline and eval-harness engineering.

Requirements

Experience with MLflow or similar experiment-tracking frameworks.
Familiarity with modern fine-tuning frameworks and PyTorch-native training stacks.
Synthetic data generation techniques.
Experience with preference optimization.
Publications in NLP/ML/FAccT or related venues, or other evidence of research leadership.
Experience evaluating multimodal models.

Benefits

Comprehensive Healthcare Benefits and Income Replacement Programs
401k with Employer Match
Global Benefit programs that fit your lifestyle, from workspace to professional development to caregiving support
Family Planning Support
Gender-Affirming Care
Mental Health & Coaching Benefits
Flexible Vacation & Paid Volunteer Time Off
Generous Paid Parental Leave

Similar Remote Jobs

Mid/Senior AI Cinematic Video Editor • EverAI EverAI

Artificial Intelligence Worldwide

1wk ago
Apply See more >

Kickstart Your Job Search

⚡ 12,376 remote jobs added this week

You're seeing 0.4% of available roles

Unlock 145,000+ jobs →

Meet JobCopilot: Your Personal Al Job Hunter

Automatically Apply to Remote Jobs

Try it now →

Before You Apply

️

🇺🇸	Be aware of the location restriction for this remote position: USA Only
‼	Beware of scams! When applying for jobs, you should NEVER have to pay anything. Learn more.

Back to Remote jobs > Artificial Intelligence > research engineer

Staff Research Engineer, Post-training & Evaluation @Reddit

Artificial Intelligence

Salary usd 230,000 - 3..	Remote Location 🇺🇸 USA Only
Employment Type full-time	Posted 2wks ago

Apply for this position

Unlock 145,000+ Remote Jobs

️

🇺🇸	Be aware of the location restriction for this remote position: USA Only
‼	Beware of scams! When applying for jobs, you should NEVER have to pay anything. Learn more.

Apply for this position

Unlock 145,000+ Remote Jobs

[Hiring] Staff Research Engineer, Post-training & Evaluation @Reddit

Apply to the best remote jobsbefore everyone else

Apply to the best remote jobs
before everyone else