Research Crawling Engineer @Wynd Labs
Software Development
Salary unspecified
Remote Location
πŸ‡ΊπŸ‡Έ USA Only
Job Type full-time
Posted 5d ago

[Hiring] Research Crawling Engineer @Wynd Labs

5d ago - Wynd Labs is hiring a remote Research Crawling Engineer. πŸ’Έ Salary: unspecified πŸ“Location: USA

Role Description

As a Research Crawling Engineer, you will design and operate large-scale web data acquisition systems for research and model development. Your work will span distributed systems, scraping infrastructure, and data pipelines.

Responsibilities

  • Build and maintain large-scale web crawlers across diverse domains
  • Design high-throughput, fault-tolerant systems for data collection (millions to billions of URLs/day)
  • Handle anti-bot systems, rate limits, and dynamic/JS-heavy sites
  • Develop pipelines for cleaning, deduplication, filtering, and normalization
  • Construct and maintain datasets for research and model training
  • Monitor crawl performance, coverage, and data quality; iterate quickly
  • Collaborate with research teams to align data collection with modeling needs
  • Optimize infrastructure for cost, latency, and reliability

Qualifications

  • Strong programming experience in one or more of: Go, Rust, Python, Java, or C++
  • Experience building web crawlers or large-scale data pipelines
  • Solid understanding of HTTP, networking, and browser behavior
  • Familiarity with distributed systems and parallel processing
  • Experience working with large datasets (TB–PB scale preferred)
  • Ability to debug unstable or adversarial environments

Preferred / Bonus

  • Experience with NLP pipelines or dataset curation for ML
  • Familiarity with LLM pretraining data or retrieval systems
  • Experience with headless browsers (e.g., Chrome DevTools Protocol, Playwright, Puppeteer)
  • Knowledge of proxy systems, IP rotation, and large-scale request orchestration
  • Background in data quality evaluation or benchmarking
  • Experience running workloads on cloud or bare-metal infrastructure

Evaluation Criteria

  • Ability to design systems that scale without degrading quality
  • Practical problem-solving under real-world constraints
  • Speed of iteration and ownership
  • Measurable improvements in data coverage, quality, or efficiency

Compensation

Based on experience and demonstrated ability to operate at scale.

Example Projects

  • Build a distributed crawler for a continuously updated, high-quality web project
  • Design a system to classify and filter billions of pages for pretraining
  • Extract structured data from dynamic, JS-heavy sites at scale
  • Improve deduplication and quality scoring across multimodal datasets

Why Work With Us

  • Opportunity: We are at the forefront of developing a web-scale crawler and knowledge graph that improves access to public web data and extends the value of AI to the people.
  • Culture: We're a lean team with a high bar. We come to work not to be comfortable, but to find out what we're capable of and to do work that matters. We're not calling for people who keep things moving. We're calling for people who make everyone around them better.
  • We prioritize low ego and high output. This is a fully remote team.
  • Compensation: You’ll receive a competitive salary, benefits and equity package.
Before You Apply
️
πŸ‡ΊπŸ‡Έ Be aware of the location restriction for this remote position: USA Only
β€Ό Beware of scams! When applying for jobs, you should NEVER have to pay anything. Learn more.
Research Crawling Engineer @Wynd Labs
Software Development
Salary unspecified
Remote Location
πŸ‡ΊπŸ‡Έ USA Only
Job Type full-time
Posted 5d ago
Apply for this position
Did not apply βœ“
Applied βœ“
Sent Follow-Up βœ“
Interview Scheduled βœ“
Interview Completed βœ“
Offer Accepted βœ“
Offer Declined βœ“
Unlock 152,720 Remote Jobs
️
πŸ‡ΊπŸ‡Έ Be aware of the location restriction for this remote position: USA Only
β€Ό Beware of scams! When applying for jobs, you should NEVER have to pay anything. Learn more.
Apply for this position
Did not apply βœ“
Applied βœ“
Sent Follow-Up βœ“
Interview Scheduled βœ“
Interview Completed βœ“
Offer Accepted βœ“
Offer Declined βœ“
Unlock 152,720 Remote Jobs
Γ—

Apply to the best remote jobs
before everyone else

Access 152,720+ vetted remote jobs and get daily alerts.

4.9 β˜…β˜…β˜…β˜…β˜… from 500+ reviews
Unlock All Jobs Now

Maybe later