Role Description
In this role, you will work on projects that improve and evaluate large language models by crafting challenging, competition-level mathematics problems and rigorously assessing model reasoning.
The ideal candidate has a strong foundation in competitive mathematics at the AIME, HMMT, and IMO (Olympiad) level across the four classic pillars: Algebra, Number Theory, Combinatorics, and Geometry.
You should be able to design novel, "Google-proof" problems intended to expose deep reasoning deficiencies in state-of-the-art models, and to diagnose precisely where and why a model's reasoning breaks down.
The role combines original problem authoring, rigorous solution writing, and detailed evaluation of model-generated responses.
This is your chance to future-proof your career in an AI-first world by working at the frontier of mathematical reasoning evaluation.
What does the day-to-day look like:
-
Design original, challenging mathematics problems at AIME, HMMT, and IMO difficulty that test the reasoning limits of large language models in multi-step, abstract settings, drawn strictly from Algebra, Number Theory, Combinatorics, or Geometry.
-
Author novel prompts that "break" evaluated models, meaning the model arrives at an incorrect final answer; ensure problems cannot be bypassed via brute-force or computationally intensive methods.
-
Solve problems independently and write detailed, logically structured, self-contained solutions with clear justifications, properly rendered in LaTeX.
-
Review model-generated solutions, identify mathematical errors, logical fallacies, or missing arguments, and diagnose the root cause using defined failure categories (Final Answer, Reasoning Steps, Instruction Following).
-
Contribute to defining new evaluation benchmarks across competition and Olympiad-level mathematics curricula.
-
Classify each prompt accurately by domain, sub-domain, topic, and proficiency level within the labeling tool.
Qualifications
-
Mathematical Expertise: Strong command of competitive mathematics at the level of AIME, HMMT, and IMO across Algebra, Number Theory, Combinatorics, and Geometry.
-
Writing Proficiency: Excellent structured written communication, including fluency with standard LaTeX delimiters for all mathematical expressions.
-
Analytical Skills: Strong research and analytical skills, with the ability to construct rigorous, proof-based reasoning.
-
Creative Thinking: Creative and lateral thinking abilities to design novel problems that are not adapted from existing competitions or online repositories.
-
Feedback Skills: Ability to provide constructive feedback, precise annotations, and accurate error diagnosis on model outputs.
-
Independence: Self-motivated and able to work independently in a remote setting.
-
Technical Setup: Desktop/Laptop setup with a good internet connection.
Requirements
-
Candidates pursuing or holding a Bachelorβs/Master's degree in Mathematics, Applied Mathematics, Statistics, Engineering, or a related field are eligible and encouraged to apply.
-
Prior experience in competitive mathematics (e.g., national or international Olympiads or equivalent competitive examinations) as a participant, coach, or problem setter is a bonus.
-
Ability to analyze and solve complex problems with a structured, logical approach and to express solutions clearly and rigorously.
Benefits
-
Work in a fully remote environment.
-
Opportunity to work on cutting-edge AI projects with leading LLM companies.
-
Potential for contract extension based on performance and project needs.
Offer Details
-
Commitments Required: At least 8 hours per day and 40 hours per week, with 4 hours of overlap with PST.
-
Engagement type: Contractor assignment/freelancer (no medical/paid leave).