AI Infrastructure Engineer @Bright Vision Technologies
Artificial Intelligence
Salary $100k - $150k
Remote Location
🇺🇸 USA Only
Employment Type full-time
Posted YDay

[Hiring] AI Infrastructure Engineer @Bright Vision Technologies

YDay - Bright Vision Technologies is hiring a remote AI Infrastructure Engineer. 💸 Salary: $100k - $150k 📍Location: USA

Role Description

We are seeking an AI Infrastructure Engineer to design, build, and operate the platform layer that powers large-scale AI training and inference workloads. The role focuses on:

  • GPU clusters
  • Distributed training frameworks
  • Scheduling
  • Storage performance
  • Developer experience for ML engineers and researchers

The ideal candidate has built or operated production AI infrastructure at scale and understands the interaction between hardware, kernel, scheduler, and ML framework, bringing strong software engineering discipline to platform work.

Qualifications

  • Bachelor’s or Master’s degree in Computer Science or a related field
  • Six or more years of experience in infrastructure, platform, or HPC engineering
  • Hands-on experience operating GPU clusters or large-scale ML training infrastructure
  • Strong proficiency in Python and at least one systems language such as Go or C++
  • Deep understanding of distributed training, accelerator architectures, and collective communication
  • Experience with Kubernetes, Slurm, Ray, or similar scheduling systems for ML workloads
  • Strong understanding of Linux internals, networking, and high-performance storage
  • Experience with at least one major cloud provider’s ML infrastructure offerings
  • Strong software engineering practices including testing, CI/CD, and code review
  • Excellent communication and cross-functional collaboration skills

Requirements

  • Design and operate GPU and accelerator infrastructure for training and inference
  • Build scheduling, queueing, and resource-sharing systems
  • Integrate frameworks such as PyTorch, JAX, DeepSpeed, FSDP, Megatron-LM, and Ray Train
  • Operate high-performance storage systems and data pipelines
  • Design networking architectures supporting RDMA, InfiniBand, NCCL
  • Build observability for AI workloads
  • Implement checkpointing, restart, and fault-tolerance patterns
  • Drive cost optimization across compute, storage, and networking
  • Develop developer tooling and paved-road workflows
  • Partner with research and applied ML teams
  • Implement security controls, isolation, and access management
  • Drive automation across cluster provisioning and lifecycle management
  • Maintain runbooks, capacity dashboards, and operational documentation
  • Stay current with AI infrastructure research and emerging open-source AI tooling

Benefits

  • Competitive base salary commensurate with experience
  • Full-time, direct W2 employment
  • Long-term, multi-year engagement
Before You Apply
🇺🇸 Be aware of the location restriction for this remote position: USA Only
Beware of scams! When applying for jobs, you should NEVER have to pay anything. Learn more.
AI Infrastructure Engineer @Bright Vision Technologies
Artificial Intelligence
Salary $100k - $150k
Remote Location
🇺🇸 USA Only
Employment Type full-time
Posted YDay
Apply for this position
Did not apply
Applied
Sent Follow-Up
Interview Scheduled
Interview Completed
Offer Accepted
Offer Declined
Application Denied
Unlock 140,000+ Remote Jobs
🇺🇸 Be aware of the location restriction for this remote position: USA Only
Beware of scams! When applying for jobs, you should NEVER have to pay anything. Learn more.
Apply for this position
Did not apply
Applied
Sent Follow-Up
Interview Scheduled
Interview Completed
Offer Accepted
Offer Declined
Application Denied
Unlock 140,000+ Remote Jobs
×

Apply to the best remote jobs
before everyone else

Access 140,000+ vetted remote jobs and get daily alerts.

4.9 ★★★★★ from 500+ reviews
Unlock All Jobs Now

Maybe later