Back to Remote jobs > All Others > bioinformatician

Senior Bioinformatician – Genomics Data Infrastructure @Violet Research Institute

All Others

Salary usd 175,000 - 2..	Remote Location 🇺🇸 USA Only
Employment Type full-time	Posted 4wks ago

[Hiring] Senior Bioinformatician – Genomics Data Infrastructure @Violet Research Institute

4wks ago - Violet Research Institute is hiring a remote Senior Bioinformatician – Genomics Data Infrastructure. 💸 Salary: usd 175,000 - 225,000 per year 📍Location: USA

Role Description

As our founding Bioinformatician, you will be the architect and owner of VRI’s entire genomics data foundation. This is not an analyst role embedded in someone else’s pipeline. You will design, build, and steward the systems that collect, unify, quality-control, and surface all genetic, sequencing, and assay data across the organization, replacing fragmented, ad-hoc processes with a rigorous, reproducible, and scalable data layer.

Your work begins with PacBio long-read whole-genome sequencing and multiomics integration, and grows into a platform that can onboard new patients and indications with speed and consistency. You will partner closely with computational biologists and clinical scientists to make data trustworthy and analysis-ready, enabling the fast and accurate clinical interpretations being made by experts you will partner with. Your job is to make their work faster, more reliable, and fully reproducible.

You will also be responsible for and own the data infrastructure, from architecting the integrations between the various systems, the data ontology and structure, and everything else needed to ensure clean data and reproducible processes and analysis. The data infrastructure you create will have a direct, near-term impact on real patients’ lives.

What You’ll Own

Genomics Data Management & Stewardship
- Own the full lifecycle of VRI’s genomics data, from raw sequencer output (FASTQ, BAM/CRAM, VCF) through QC, storage, versioning, and retrieval, as the single accountable person for data integrity across all datasets.
- Define and enforce data standards, naming conventions, metadata schemas, and ontologies for all data types: sequences, variant calls, splicing data, and experimental assay results.
- Build and maintain a centralized, queryable genomics data lake that unifies heterogeneous inputs from internal labs and CRO partners (US, Israel, China) into a single, analysis-ready data model.
- Establish sample tracking, data lineage documentation, and versioning protocols so every result is traceable back to its source.
- Manage cloud storage strategy (AWS S3 or GCP) across hot, warm, and cold tiers; balancing cost, accessibility, and HIPAA-compliant security.
- Create and maintain an internal data catalog documenting all datasets, pipeline versions, and transformation logic so any scientist can understand what data exists and how it was produced.
Pipeline Development & Data Engineering
- Design and build production-grade, reusable pipelines for ingesting and processing PacBio long-read WGS data, including phased genome assembly, structural variant calling, and SNP/indel detection.
- Build ETL workflows that clean, normalize, and integrate diverse data modalities (sequencing reads, RNA/splicing data, and assay metadata) into unified, analysis-ready formats.
- Automate QC steps to surface data anomalies early; monitor data quality continuously across sequencing batches and CRO handoffs.
- Establish code quality standards, testing protocols, and deployment practices (version control, containerization) that will scale as the team grows.
- Maintain and develop internal database systems, including our proprietary VRI OS platform used for experiment tracking — contributing to data integrity, system upkeep, and building custom tools and interfaces to support research workflows.
- Integrate physics-based thermodynamic models and predictive algorithms to forecast therapeutic performance and guide design decisions.
- Develop and apply design criteria and ranking systems to evaluate therapeutic candidates computationally before advancing to wet lab testing.
- Build and maintain algorithms that bridge computational predictions with experimental validation, optimizing the design-to-testing pipeline.
Multiomics Integration
- Integrate multi-layered genomics data (DNA, RNA-seq, long-read RNA, splicing) with proteomics, metabolomics, and mass spectrometry data (LC-MS, MS/MS) into coherent, patient-centric multiomics datasets.
- Query and harmonize large-scale population cohorts (UK Biobank, Mount Sinai Million, and similar) to contextualize patient findings.
- Partner with computational biologists and clinical scientists to surface analysis-ready datasets, enabling and supporting their interpretation work.
Insight Delivery & Reporting
- Build automated reporting pipelines that push structured summaries of data quality, pipeline status, and batch results to scientific stakeholders, thereby replacing manual handoffs.
- Develop QC dashboards to surface data quality metrics, pipeline status, and anomaly alerts in real time.
- Directly support IND filings through preparation of relevant datasets and written reports/descriptions.
Continuous Improvement
- Actively monitor the bioinformatics landscape — using AI-assisted tools where applicable — to identify emerging algorithms and platforms that can sharpen VRI’s data infrastructure.
- Lay the foundation for future bioinformatics hires by embedding well-documented, reproducible data practices from day one.

Qualifications

4+ years of hands-on bioinformatics experience in a research or biotech environment, with a strong focus on genomics data management and pipeline engineering.
Proven experience owning genomics data end-to-end — not just running analyses, but building the systems and standards that make data trustworthy and reusable.
Strong fluency in genomics file formats and toolchains: FASTQ, BAM/CRAM, VCF, BED; variant callers (GATK, DeepVariant, PBSV); assembly tools (hifiasm or equivalent).
Demonstrated experience with PacBio long-read WGS data and associated long-read tooling.
Proficiency in Python; experience building and maintaining production-grade pipelines with workflow managers (Nextflow, Snakemake, or WDL).
Hands-on experience with cloud data infrastructure: AWS S3 or GCP, data lake design, pipeline orchestration, and HIPAA-compliant storage.
Experience querying and integrating biobank-scale datasets (UK Biobank or similar).
Strong organizational skills — you naturally document your work, build systems others can use, and take ownership of data quality without being asked.

Requirements

Experience with RNA-seq and long-read RNA analysis, including pre-mRNA processing and splicing characterization.
Familiarity with LIMS systems (Benchling, LabVantage, or similar) and data governance / FAIR data frameworks.
Familiarity with containerization tools (Docker, Singularity) and CI/CD practices.
Exposure to siRNA, ASO, or other therapeutic modality-specific bioinformatics.
Experience in a seed or early-stage biotech; comfort building infrastructure from scratch.

Behavioral Essentials

Execute independently from loosely specified tasks; you are self-directing.
Ask for help only when truly blocked, communicating clearly what is needed and what you have already tried.
Thrive in early-stage, ambiguous, high-pace environments where the path is built as you walk it.
Mission-driven with genuine, active care for patient impact (a daily operating principle at VRI).

AI, Tools & Operating Environment

At VRI we genuinely embrace AI at every step of the process. Claude and other AI tools are used throughout the day, across every function. Computational fluency and comfort with AI-assisted analysis and literature synthesis are expected. If you treat AI as a novelty or an occasional aid, this is not the right environment.

How We Hire

We are looking to hire immediately and are moving quickly. Our anticipated process can take as little as 5 days: Apply → Initial Recruiter Call → Hiring Manager Interview → Technical Stakeholder Interview → Executive Director Interview → Offer.

Compensation & Benefits

VRI provides competitive compensation based upon experience, qualifications, and role scope, starting at $175k. We also offer a full suite of benefits.

Similar Remote Jobs

Kickstart Your Job Search

⚡ 13,243 remote jobs added this week

You're seeing 0.4% of available roles

Unlock 155,000+ jobs →

Meet JobCopilot: Your Personal Al Job Hunter

Automatically Apply to Remote Jobs

Try it now →

Before You Apply

️

🇺🇸	Be aware of the location restriction for this remote position: USA Only
‼	Beware of scams! When applying for jobs, you should NEVER have to pay anything. Learn more.

Back to Remote jobs > All Others > bioinformatician

Senior Bioinformatician – Genomics Data Infrastructure @Violet Research Institute

All Others

Salary usd 175,000 - 2..	Remote Location 🇺🇸 USA Only
Employment Type full-time	Posted 4wks ago

Apply for this position

Unlock 155,000+ Remote Jobs

️

🇺🇸	Be aware of the location restriction for this remote position: USA Only
‼	Beware of scams! When applying for jobs, you should NEVER have to pay anything. Learn more.

Apply for this position

Unlock 155,000+ Remote Jobs

[Hiring] Senior Bioinformatician – Genomics Data Infrastructure @Violet Research Institute

Apply to the best remote jobsbefore everyone else

Apply to the best remote jobs
before everyone else