|
Salary
unspecified
|
Remote
Location
|
|
Job Type
full-time
|
Posted
2d ago
|
2d ago - Andromeda Cluster is hiring a remote Senior Site Reliability Engineer - AI Infrastructure. 💸 Salary: unspecified 📍Location: Worldwide
Role Description
This is not a generalist SRE role. You will design, operate, and debug large-scale GPU infrastructure used for distributed training and inference, working directly with customers pushing the limits of modern AI systems.
We’re looking for engineers who have personally run GPU clusters in production, understand the failure modes of distributed training, and can reason about performance from network fabric → kernel → framework.
Qualifications
Strong Candidates May Have
Benefits
Company Description
Andromeda Cluster is an equal opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all employees. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.
|
Be aware of the location restriction for this remote position: Worldwide |
| ‼ | Beware of scams! When applying for jobs, you should NEVER have to pay anything. Learn more. | ️
|
Salary
unspecified
|
Remote
Location
|
|
Job Type
full-time
|
Posted
2d ago
|
|
Be aware of the location restriction for this remote position: Worldwide |
| ‼ | Beware of scams! When applying for jobs, you should NEVER have to pay anything. Learn more. | ️
Access 152,720+ vetted remote jobs and get daily alerts.