Back to Remote jobs > Software Development

Senior Machine Learning Engineer - Hardware Abstractions & Performance Optimization @Luma Ai

Software Development

Salary -	Remote Location 🇺🇸 USA Only
Job Type Full-time	Posted 5d ago

🙈 Does this job need an edit?

[Hiring] Senior Machine Learning Engineer - Hardware Abstractions & Performance Optimization @Luma Ai

Mar 22, 2025 - Luma Ai is hiring a remote Senior Machine Learning Engineer - Hardware Abstractions & Performance Optimization. 📍Location: USA.

Luma’s mission is to build multimodal AI to expand human imagination and capabilities. We believe that multimodality is critical for intelligence. To go beyond language models and build more aware, capable and useful systems, the next step function change will come from vision. So, we are working on training and scaling up multimodal foundation models for systems that can see and understand, show and explain, and eventually interact with our world to effect change.

We are looking for engineers with significant experience maintaining & designing highly efficient systems and code that can be optimized to run on multiple hardware platforms, bringing our state-of-the-art models to as many people at the best performance per dollar.

Responsibilities

Ensure efficient implementation of models & systems with a focus on designing, maintaining, and writing abstractions that scale beyond NVIDIA/CUDA hardware.
Identify and remedy efficiency bottlenecks (memory, speed, utilization, communication) by profiling and implementing high-performance PyTorch code, deferring to Triton or similar kernel-level languages as necessary.
Benchmarking our products across a variety of hardware & software to help the product team understand the optimal tradeoffs between latency, throughput and cost at various degrees of parallelism.
Work together with our partners to help them identify bottlenecks and push forward new iterations of hardware and software.
Work closely together with the rest of the research team to ensure systems are planned to be as efficient as possible from start to finish and raise potential issues for hardware integration.

Must have experience

Experience optimizing for memory, latency and throughput in Pytorch.
- Bonus: experience with non-NVIDIA systems
Experience using torch.compile / torch.XLA.
Experience benchmarking and profiling GPU & CPU code in Pytorch for optimal device utilization (examples: torch profiler, memory profilers, trace viewers, custom tooling).
Experience building tools & abstractions to ensure models run optimally on different hardware and software stacks .
Experience working with transformer models and attention implementations.
Experience with parallel inference, particularly with tensor parallelism, pipeline parallelism.

Good to have experience

Experience with high-performance Triton/CUDA and writing custom PyTorch kernels and ops. Top candidates will be able to write fused kernels for common hot paths, understand when to make use of lower level features like tensor cores or warp intrinsics, and will understand where these tools can be most impactful.
Experience writing high-performance parallel C++. Bonus if done within an ML context with PyTorch, like for data loading, data processing, inference code
Experience building inference / demo prototype code (incl. Gradio, Docker etc.)

About The Company

Luma Ai

Similar Remote Jobs

Lead Software Engineer • Anima Health Anima Health

Software Development Northern America Europe UK

Featured
Apply See more >
Software Engineer - Infrastructure Team • Discourse Discourse

Software Development Worldwide

Featured
Apply See more >
Full Stack TypeScript + Golang 2D 3D floor plan editor • Visuary Visuary

Software Development €30-€60k Worldwide

Featured
Apply See more >

Kickstart Your Job Search

Need advice to apply? Join our free Webinar « 3 Mistakes to Avoid When Looking For A Remote Startup Job (And What To Do Instead) »

Unlock 53,531 additional remote jobs, advanced search & email notifications

Get Access Now

Too many emails? Declutter your inbox with Meco

Your home for reading newsletters. Try for free - no card required.

Try for Free

Before You Apply

️

🇺🇸	Be aware of the location restriction for this remote position: USA Only
‼	Beware of scams! When applying for jobs, you should NEVER have to pay anything. Learn more.

🙈 Does this job need an edit?

Back to Remote jobs > Software Development

Senior Machine Learning Engineer - Hardware Abstractions & Performance Optimization @Luma Ai

Software Development

Salary -	Remote Location 🇺🇸 USA Only
Job Type Full-time	Posted 5d ago

🙈 Does this job need an edit?

Apply for this position Unlock 53,531 Remote Jobs

️

🇺🇸	Be aware of the location restriction for this remote position: USA Only
‼	Beware of scams! When applying for jobs, you should NEVER have to pay anything. Learn more.

Apply for this position Unlock 53,531 Remote Jobs

[Hiring] Senior Machine Learning Engineer - Hardware Abstractions & Performance Optimization @Luma Ai

Get daily remote job opportunities in your inbox