Get daily remote job opportunities in your inbox

No middlemen, no spam, no infinite scrolling.

Get relevant job opportunities, one email at a time.

Unsubscribe at any time.

Senior Research Engineer - Performance Optimization @Luma Ai

[Hiring] Senior Research Engineer - Performance Optimization @Luma Ai

Mar 17, 2025 - Luma Ai is hiring a remote Senior Research Engineer - Performance Optimization. 💸 Salary: $180,000 - $250,000/year. 📍Location: USA.

This description is a summary of our understanding of the job description. Click on 'Apply' button to find out more.

Role Description

We are looking for engineers with significant problem solving experience in PyTorch, CUDA and distributed systems. You will work with Research Scientists to build & train cutting edge foundation models on thousands of GPUs.

  • Ensure efficient implementation of models & systems for data processing, training, inference and deployment
  • Identify and implement optimization techniques for massively parallel and distributed systems
  • Identify and remedy efficiency bottlenecks (memory, speed, utilization) by profiling and implementing high-performance CUDA, Triton, C++ and PyTorch code
  • Work closely together with the research team to ensure systems are planned to be as efficient as possible from start to finish
  • Build tools to visualize, evaluate and filter datasets
  • Implement cutting-edge product prototypes based on multimodal generative AI

Qualifications

  • Experience training large models using Python & Pytorch, including practical experience working with the entire development pipeline from data processing, preparation & data loading to training and inference.
  • Experience optimizing and deploying inference workloads for throughput and latency across the stack (inputs, model inference, outputs, parallel processing etc.)
  • Experience with profiling CPU & GPU code in PyTorch, including Nvidia Nsight or similar.
  • Experience writing & improving highly parallel & distributed PyTorch code, with familiarity in DDP, FSDP, Tensor Parallel, etc.
  • Experience writing high-performance parallel C++. Bonus if done within an ML context with PyTorch, like for data loading, data processing, inference code.
  • Experience with high-performance Triton / CUDA and writing custom PyTorch kernels. Top candidates will be able to utilize tensor cores; optimize performance with CUDA memory and other similar skills.
  • Good to have experience working with Deep learning concepts such as Transformers & Multimodal Generative models such as Diffusion Models and GANs.
  • Good to have experience building inference / demo prototype code (incl. Gradio, Docker etc.)

Compensation

The pay range for this position in California is $180,000 - $250,000yr; however, base pay offered may vary depending on job-related knowledge, skills, candidate location, and experience. We also offer competitive equity packages in the form of stock options and a comprehensive benefits plan.

Similar Remote Jobs

More jobs at Luma Ai

More AI / ML jobs

More jobs in USA

Before You Apply
📍 Be aware of the location restriction for this remote position: USA
Beware of scams! When applying for jobs, you should NEVER have to pay anything. Learn more.
Senior Research Engineer - Performance Optimization @Luma Ai
Software Development
Salary 💸 $180,000 - $250,000/year
Remote Location
USA
Job Type full-time
Posted Mar 17, 2025
Apply for this position Unlock 52,660 Remote Jobs
📍 Be aware of the location restriction for this remote position: USA
Beware of scams! When applying for jobs, you should NEVER have to pay anything. Learn more.
Senior Research Engineer - Performance Optimization Apply for this position Unlock 52,660 Remote Jobs
×
  • Unlock 52,660 hidden remote jobs.
  • Your shortcut to remote work. Apply before everyone else.
  • Click and apply. No middlemen, no hassle.

We’re not like the other sites. Come see why!

50% off in March 2025
  • Single payment
  • Lifetime access
  • Filter by location/skills/salary…
  • Create custom email alerts
  • Private Slack Community