[Hiring] AI Benchmark Engineer | Native Language Specialist @LILT (Production)
AI Benchmark Engineer | Native Language Specialist @LILT (Production)
Artificial Intelligence
Salary unspecified
Remote Location
Employment Type contract
Posted 1mth ago

[Hiring] AI Benchmark Engineer | Native Language Specialist @LILT (Production)

1mth ago - LILT (Production) is hiring a remote AI Benchmark Engineer | Native Language Specialist. πŸ’Έ Salary: unspecified πŸ“Location: Worldwide

Role Description

We are building a rigorous, verifiable evaluation suite of Terminal-Bench tasks designed to test the limits of large language models on multilingual software challenges. Our goal is to measure multilingual robustness across prompt language effects, non-English data processing, and complex locale/encoding edge cases in terminal workflows.

We are seeking experienced native-speaking software engineers to design, build, and validate these benchmarks. You will create high-signal, high-quality tasks that genuinely test a model's ability to handle multilingual environments without relying on English translation crutches.

Note this is a remote, freelance opportunity.

What You’ll Deliver

  • Task Engineering: Evaluating Coding Agents.
  • Asset Creation: Build realistic task environments using datasets and files in your native language.
  • Prompting & Translation: Finding failure points where AI does not work, in your native language.
  • Implementation & Verification: Support the development of robust solutions (reference implementations) and write highly reliable, deterministic verifier scripts (using rubric-based judging only when strictly necessary).
  • Calibration & Execution: Analyze execution logs and calibrate task difficulty (Easy to Very Hard) using standard Terminal-Bench run configurations against various model tiers (Haiku, Sonnet, Opus).
  • Quality Assurance: Participate in a rigorous, 4-layer human quality control process (creation, human review, calibration review, and audit) alongside automated LLM-based checks to ensure fairness, grammatical accuracy, and benchmark integrity.

Qualifications

  • 5+ years of industry experience in software engineering.
  • Proven track record at leading technology companies and/or graduation from top-tier engineering universities.
  • Native or near-native fluency, with a deep understanding of its grammar, register, and phrasing rules. High English proficiency.
  • Strong proficiency in Python, standard shell scripting, and data processing.
  • Extensive experience with Terminal/CLI-based development workflows and a working familiarity with coding agents.
  • Deep technical understanding of multilingual text processing pitfalls, including:
    • Encoding/decoding robustness and Unicode normalization.
    • Locale-dependent conventions (collation, casing, non-Gregorian dates).
    • Text I/O, toolchain interoperability, and safe string operations.
    • Bidirectional/RTL handling, font fallbacks, and rendering/typography in UI or artifacts.

Benefits

  • Your schedule, your rules: Work when you want, as much or as little as you want. No fixed hours, no check-ins, no micromanaging.
  • Get paid quickly and fairly: Competitive rates, prompt payments, no chasing invoices.
  • Work on projects that actually matter: Contribute to cutting-edge AI and language technology that is shaping how humans and machines communicate.
  • Be part of something bigger: Join a global community of linguists, subject matter experts, and language professionals who are advancing human knowledge together.
  • Grow without limits: Access to diverse, innovative projects that expand your portfolio and sharpen your skills across industries and domains.
  • Have fun doing what you love: Bring your language skills to life on projects that are as interesting as they are impactful.

How to join our expert community

  • Submit your application including an updated copy of your CV in English.
  • Complete a GenAI assessment to evaluate your skills.
  • Finalize onboarding and profile set-up in our system, and become eligible for Applied AI projects.
Before You Apply
️
worldwide Be aware of the location restriction for this remote position: Worldwide
β€Ό Beware of scams! When applying for jobs, you should NEVER have to pay anything. Learn more.
AI Benchmark Engineer | Native Language Specialist @LILT (Production)
Artificial Intelligence
Salary unspecified
Remote Location
Employment Type contract
Posted 1mth ago
Apply for this position
Did not apply βœ“
Applied βœ“
Sent Follow-Up βœ“
Interview Scheduled βœ“
Interview Completed βœ“
Offer Accepted βœ“
Offer Declined βœ“
Application Denied βœ“
Unlock 160,000+ Remote Jobs
️
worldwide Be aware of the location restriction for this remote position: Worldwide
β€Ό Beware of scams! When applying for jobs, you should NEVER have to pay anything. Learn more.
Apply for this position
Did not apply βœ“
Applied βœ“
Sent Follow-Up βœ“
Interview Scheduled βœ“
Interview Completed βœ“
Offer Accepted βœ“
Offer Declined βœ“
Application Denied βœ“
Unlock 160,000+ Remote Jobs
Γ—

Apply to the best remote jobs
before everyone else

Access 160,000+ vetted remote jobs and get daily alerts.

4.9 β˜…β˜…β˜…β˜…β˜… from 500+ reviews
Unlock All Jobs Now

Maybe later