Freelance Agent Evaluation Engineer - AI Projects on Mindrift @Mindrift
Artificial Intelligence
Salary usd 50 per hour
Remote Location
Employment Type part-time
Posted 2mths ago

[Hiring] Freelance Agent Evaluation Engineer - AI Projects on Mindrift @Mindrift

2mths ago - Mindrift is hiring a remote Freelance Agent Evaluation Engineer - AI Projects on Mindrift. πŸ’Έ Salary: usd 50 per hour πŸ“Location: Worldwide

Role Description

Mindrift connects specialists with project-based AI opportunities for leading tech companies, focused on testing, evaluating, and improving AI systems. Participation is project-based, not permanent employment.

What this opportunity involves:

  • Building a dataset to evaluate AI coding agents β€” how well a model handles real-world developer tasks.
  • Creating challenging tasks and evaluation criteria within realistic simulated environments:
    • Build virtual companies following a high-level plan - codebase, infrastructure, and context (conversations, documentation, tickets) that form a realistic environment with development history.
    • Assemble and calibrate tasks from intermediate states of the virtual company: craft the prompt, define evaluation criteria, and ensure the task is solvable and the evaluation is fair.
    • Design tasks set in isolated environments - emulations of a developer's workstation: a Linux machine with development tools (terminal, CLI), MCP servers (repository, task tracker, messenger, documentation, etc.), and a real web application codebase.
    • Write tests that accept all correct solutions and reject incorrect ones - neither too strict (breaking on valid approaches) nor too lenient (passing bad ones).
    • Iterate with an AI agent on tests - verifying they catch real problems, don't miss bad solutions, and don't break on good ones.
    • Review code written by agents, analyze why an agent failed or succeeded, and design edge cases and adversarial scenarios.
    • Iterate based on feedback from expert QA reviewers who score your work on quality criteria.

What this is NOT:

  • Not data labeling.
  • Not prompt engineering.
  • Not writing code from scratch - the agent writes most of the code; you guide and evaluate.
  • A significant part of the work is done together with AI - it's very hard to create tasks that challenge frontier models without using frontier models.

Qualifications

  • Degree in Computer Science, Software Engineering, or related fields.
  • 5+ years in software development, primarily Python (FastAPI, pytest, async/await, subprocess, file operations).
  • Background in full-stack development, with experience building React-based interfaces (JavaScript/TypeScript) and robust back-end systems.
  • Experience writing tests (functional, integration β€” not just running them).
  • Familiarity with Docker containers and infrastructure tools (Postgres, Kafka, Redis).
  • CI/CD understanding (GitHub Actions as a user: triggers, labels, reading results).
  • English proficiency - B2.

Requirements

  • You don't need to be an expert in every item, but you should be comfortable reading and reasoning about code across the stack.

Benefits

  • Tasks for this project are estimated to take 20 hours to complete, depending on complexity. This is an estimate and not a schedule requirement; you choose when and how to work.
  • Tasks must be submitted by the deadline and meet the listed acceptance criteria to be accepted.
  • On this project, contributors can earn up to $50 per hour equivalent, depending on their level and pace of contribution.
  • Compensation varies across projects depending on scope, complexity, and required expertise.
  • Please note that other projects on the platform may offer different earning levels based on their requirements.
Before You Apply
️
worldwide Be aware of the location restriction for this remote position: Worldwide
β€Ό Beware of scams! When applying for jobs, you should NEVER have to pay anything. Learn more.
Freelance Agent Evaluation Engineer - AI Projects on Mindrift @Mindrift
Artificial Intelligence
Salary usd 50 per hour
Remote Location
Employment Type part-time
Posted 2mths ago
Apply for this position
Did not apply βœ“
Applied βœ“
Sent Follow-Up βœ“
Interview Scheduled βœ“
Interview Completed βœ“
Offer Accepted βœ“
Offer Declined βœ“
Application Denied βœ“
Unlock 150,000+ Remote Jobs
️
worldwide Be aware of the location restriction for this remote position: Worldwide
β€Ό Beware of scams! When applying for jobs, you should NEVER have to pay anything. Learn more.
Apply for this position
Did not apply βœ“
Applied βœ“
Sent Follow-Up βœ“
Interview Scheduled βœ“
Interview Completed βœ“
Offer Accepted βœ“
Offer Declined βœ“
Application Denied βœ“
Unlock 150,000+ Remote Jobs
Γ—

Apply to the best remote jobs
before everyone else

Access 150,000+ vetted remote jobs and get daily alerts.

4.9 β˜…β˜…β˜…β˜…β˜… from 500+ reviews
Unlock All Jobs Now

Maybe later