Senior Data Architect @Omilia
Data Analysis
Salary unspecified
Remote Location
Job Type full-time
Posted 2d ago

[Hiring] Senior Data Architect @Omilia

2d ago - Omilia is hiring a remote Senior Data Architect. πŸ’Έ Salary: unspecified πŸ“Location: Czech Republic

Role Description

  • Own the Training Environment data architecture end-to-end: dataset design and schema for all ML training pipelines, including dialog corpora for LLM training, conversational steps for NLU models, annotated evaluation sets, and whole-call recordings for speech-to-speech model development.
  • Define and govern data selection and sampling strategy: establish criteria that determine which production conversations have the highest training value, including diversity-optimized sampling, confidence-based filtering, edge-case prioritization, and deduplication strategies.
  • Build and maintain the data catalog and dataset discovery infrastructure: enable ML engineers across LLM, NLU, Speech, and Agentic teams to find, understand, and use training data without friction.
  • Define annotation pipeline architecture: establish requirements for data labeling β€” intent annotation, entity tagging, dialog act classification, task completion scoring, and agentic reasoning evaluation β€” across internal annotators and external vendors.
  • Architect the data flywheel: the closed-loop system where real customer conversations feed back into training data collection, curation, annotation, model retraining, and evaluation.
  • Own and maintain data pipelines and infrastructure spanning Snowflake, AWS S3, ETL/ELT pipelines (Airflow), and integration with ML training workflows on AWS SageMaker.

Qualifications

  • 5+ years in data architecture, data engineering, or LLM/ML data infrastructure, with demonstrated ownership of production data systems serving ML/AI model development.
  • Strong understanding of ML training data requirements β€” what makes training data high-quality, diverse, and useful for LLM and NLU model development, not just clean and well-structured.
  • Deep experience with data modeling, schema design, and data pipeline architecture.
  • Strong proficiency with Snowflake, AWS S3, and ETL/ELT orchestration tools (Airflow, dbt, or similar).
  • Experience defining annotation requirements and managing data annotation workflows β€” intent labeling, entity tagging, dialog classification, or similar NLP annotation tasks.
  • Experience with data cataloging, metadata management, and dataset discovery at scale.
  • Strong SQL and Python skills for data pipeline development and data quality analysis.
  • Experience with data quality frameworks: deduplication, sampling strategies, diversity optimization.
  • Desirable: hands-on experience with LLM training data preparation β€” instruction tuning datasets, preference data, RLHF/DPO annotation, synthetic data generation.
  • Desirable: experience with data anonymization and PII/PCI redaction as part of ML data pipelines.
  • Desirable: familiarity with AWS SageMaker ML pipeline integration and active learning/data selection strategies.
  • Desirable: knowledge of voice/audio data handling, storage, and processing at scale.

Requirements

  • Excellent communication skills β€” ability to translate ML team data needs into concrete pipeline specifications and explain data architecture decisions to both technical and compliance audiences.
  • Strong cross-functional collaboration skills: track record of working effectively with ML engineers, platform teams, and product stakeholders.
  • Analytical mindset with the ability to make informed trade-off decisions on data quality, diversity, and scale.
  • Self-driven ownership mentality: comfortable operating as the accountable technical owner of a critical platform domain.

Benefits

  • Fixed compensation.
  • Long-term employment with the working days vacation.
  • Development in professional growth (courses, training, etc).
  • Being part of successful cutting-edge technology products that are making a global impact in the service industry.
  • Proficient and fun-to-work-with colleagues.
  • Apple gear.

Company Description

Omilia is proud to be an equal opportunity employer and is dedicated to fostering a diverse and inclusive workplace. We believe that embracing diversity in all its forms enriches our workplace and drives our collective success. We are committed to creating an environment where everyone feels welcomed, valued, and empowered to contribute their unique perspectives without regard to factors such as race, color, religion, gender, gender identity or expression, sexual orientation, national origin, heredity, disability, age, or veteran status, all eligible candidates will be given consideration for employment.

Before You Apply
️
remote Be aware of the location restriction for this remote position: Czech Republic
β€Ό Beware of scams! When applying for jobs, you should NEVER have to pay anything. Learn more.
Senior Data Architect @Omilia
Data Analysis
Salary unspecified
Remote Location
Job Type full-time
Posted 2d ago
Apply for this position
Did not apply βœ“
Applied βœ“
Sent Follow-Up βœ“
Interview Scheduled βœ“
Interview Completed βœ“
Offer Accepted βœ“
Offer Declined βœ“
Unlock 152,720 Remote Jobs
️
remote Be aware of the location restriction for this remote position: Czech Republic
β€Ό Beware of scams! When applying for jobs, you should NEVER have to pay anything. Learn more.
Apply for this position
Did not apply βœ“
Applied βœ“
Sent Follow-Up βœ“
Interview Scheduled βœ“
Interview Completed βœ“
Offer Accepted βœ“
Offer Declined βœ“
Unlock 152,720 Remote Jobs
Γ—

Apply to the best remote jobs
before everyone else

Access 152,720+ vetted remote jobs and get daily alerts.

4.9 β˜…β˜…β˜…β˜…β˜… from 500+ reviews
Unlock All Jobs Now

Maybe later