Data Infrastructure Engineer @Guidehouse
Software Development
Salary usd 98,000 - 16..
Remote Location
🇺🇸 USA Only
Job Type full-time
Posted 2d ago

[Hiring] Data Infrastructure Engineer @Guidehouse

2d ago - Guidehouse is hiring a remote Data Infrastructure Engineer. 💸 Salary: usd 98,000 - 163,000 per year 📍Location: USA

Role Description

We are seeking a Data Infrastructure Engineer to build and operate the data platform that powers AI/ML analytics modules. You will design and implement scalable data ingestion pipelines, robust ETL/ELT, and a modern data lake / delta lake (lakehouse) on AWS. You’ll also establish a managed metadata repository and governance layers (catalog, lineage, quality, access controls) and deliver automated cloud provisioning plus CI/CD for data pipelines to enable reliable, repeatable deployments across environments. This role is ideal for an engineer who enjoys platform building, automation, and enabling advanced analytics through trusted, well-governed data.

What You Will Do:

  • Build & Operate Data Pipelines (Batch + Streaming)
    • Design and implement batch and streaming ingestion from APIs, relational databases, file drops, event streams, and external partners.
    • Build and optimize ETL/ELT pipelines to produce curated, analytics-ready datasets for reporting and ML consumption.
    • Implement incremental processing patterns, change data capture (CDC) approaches where appropriate, and data contract standards.
  • Deliver a Modern Lakehouse (Data Lake / Delta Lake)
    • Build and manage a scalable lakehouse on AWS object storage (e.g., S3) using open table/file formats and delta/lakehouse concepts (e.g., ACID tables, schema evolution, time travel patterns).
    • Optimize performance and cost through partitioning, compaction, lifecycle policies, and efficient compute/storage usage.
    • Establish environment standards for dev/test/prod and consistent promotion across stages.
  • Metadata, Governance, Lineage & Quality (Trust Layer)
    • Implement a managed metadata repository for dataset cataloging, ownership, glossary/definitions, tagging, and discoverability.
    • Enable end-to-end lineage (source → transformations → consumption) to support auditability and impact analysis.
    • Implement governance controls including policy-based access, data classification, retention, and secure data handling.
    • Build operational data quality checks (freshness, completeness, validity, anomaly detection) and publish SLAs/SLOs.
  • AWS Automation + CI/CD for Data Pipelines
    • Implement automated cloud provisioning in AWS using Infrastructure as Code (IaC) for consistent environments and secure-by-default baselines.
    • Build and enhance CI/CD for data pipelines, including automated tests, validation gates, promotion workflows, and rollback strategies.
    • Improve observability with metrics/logs/alerts, dashboards, runbooks, and incident response readiness.
  • Cross-Team Collaboration & Documentation
    • Work closely with engineering, security, networking, and application teams to support mission needs and delivery timelines.
    • Maintain high-quality engineering documentation including SOPs, system diagrams, and secure configuration baselines.
    • Summarize and present findings and recommendations—both written and verbal—to technical and non-technical stakeholders.

Qualifications

  • Must be able to OBTAIN and MAINTAIN a Federal or DoD "PUBLIC TRUST"; candidates must obtain approved adjudication of their PUBLIC TRUST prior to onboarding with Guidehouse. Candidates with an ACTIVE PUBLIC TRUST or SUITABILITY are preferred.
  • Bachelor’s degree in Engineering, IT, Computer Science, or related field (or equivalent experience).
  • Minimum of FOUR (4) years experience building production data pipelines and/or data platforms.
  • Strong experience implementing data ingestion and ETL/ELT workflows, including data modeling and transformation best practices.
  • Hands-on experience building a data lake / delta lake (lakehouse) on AWS (or equivalent cloud) using object storage and modern table formats/patterns.
  • Proficiency in SQL and one programming language commonly used for data engineering (Python preferred; Scala/Java acceptable).
  • Experience with metadata management and governance: cataloging, lineage, ownership, access controls, classification and policy enforcement.
  • Experience implementing automated AWS provisioning using IaC and operating across multiple environments.
  • Experience building or operating CI/CD pipelines for data workflows (testing, packaging, deployment automation, environment promotion).
  • Solid security fundamentals: IAM/least privilege, encryption, secrets management, secure SDLC practices.

Requirements

  • Hands-on experience with Databricks.
  • Hands-on experience utilizing modern DevOps practices, including tools like Git, Terraform, Jenkins, AWS CodePipeline, and Docker.
  • Experience utilizing AI-assisted coding tools (e.g., GitHub Copilot, ChatGPT, Cursor, Kiro) to safely accelerate implementation while maintaining strict code quality through testing, code reviews, and security practices.
  • Knowledge graph and Graph RAG experience, including:
    • Graph modeling and ontology/taxonomy alignment.
    • Entity resolution and relationship extraction.
    • Hybrid retrieval approaches combining graph traversal with semantic/vector search to improve grounding and explainability.

Benefits

  • Medical, Rx, Dental & Vision Insurance
  • Personal and Family Sick Time & Company Paid Holidays
  • Parental Leave
  • 401(k) Retirement Plan
  • Group Term Life and Travel Assistance
  • Voluntary Life and AD&D Insurance
  • Health Savings Account, Health Care & Dependent Care Flexible Spending Accounts
  • Transit and Parking Commuter Benefits
  • Short-Term & Long-Term Disability
  • Tuition Reimbursement, Personal Development, Certifications & Learning Opportunities
  • Employee Referral Program
  • Corporate Sponsored Events & Community Outreach
  • Care.com annual membership
  • Employee Assistance Program
  • Supplemental Benefits via Corestream (Critical Care, Hospital Indemnity, Accident Insurance, Legal Assistance and ID theft protection, etc.)
  • Position may be eligible for a discretionary variable incentive bonus
Before You Apply
🇺🇸 Be aware of the location restriction for this remote position: USA Only
Beware of scams! When applying for jobs, you should NEVER have to pay anything. Learn more.
Data Infrastructure Engineer @Guidehouse
Software Development
Salary usd 98,000 - 16..
Remote Location
🇺🇸 USA Only
Job Type full-time
Posted 2d ago
Apply for this position
Did not apply
Applied
Sent Follow-Up
Interview Scheduled
Interview Completed
Offer Accepted
Offer Declined
Unlock 152,720 Remote Jobs
🇺🇸 Be aware of the location restriction for this remote position: USA Only
Beware of scams! When applying for jobs, you should NEVER have to pay anything. Learn more.
Apply for this position
Did not apply
Applied
Sent Follow-Up
Interview Scheduled
Interview Completed
Offer Accepted
Offer Declined
Unlock 152,720 Remote Jobs
×

Apply to the best remote jobs
before everyone else

Access 152,720+ vetted remote jobs and get daily alerts.

4.9 ★★★★★ from 500+ reviews
Unlock All Jobs Now

Maybe later