Back to Remote jobs > Devops > reliability engineer

Senior Engineer - Site Reliability @Core42 US Services LLC

Devops

Salary usd 109,600 - 1..	Remote Location 🇺🇸 USA Only
Employment Type full-time	Posted 1wk ago

[Hiring] Senior Engineer - Site Reliability @Core42 US Services LLC

1wk ago - Core42 US Services LLC is hiring a remote Senior Engineer - Site Reliability. 💸 Salary: usd 109,600 - 164,400 per year 📍Location: USA

Role Description

As a Senior Site Reliability Engineer, you will be responsible for designing, implementing, and operating scalable, reliable, and secure infrastructure to support large-scale AI and HPC workloads. You will play a key role in building and maintaining CI/CD pipelines, Kubernetes-based environments, and observability systems that ensure high availability and performance across globally distributed platforms.

Working closely with engineering, product, and operations teams, you will drive automation, enforce SRE best practices, and contribute to a resilient and efficient infrastructure ecosystem that supports mission-critical applications.

Your Key Responsibilities

CI/CD & Automation: Design, build, and maintain robust CI/CD pipelines using tools such as GitLab CI, Azure DevOps, and/or Jenkins to enable rapid and secure software delivery.
Kubernetes Operations: Operate, manage, and optimize Kubernetes clusters, ensuring scalability, performance, and resilience.
Infrastructure as Code: Develop and maintain infrastructure using Terraform, Helm, Ansible, or similar tools to automate provisioning and configuration.
Observability & Monitoring: Implement and manage monitoring solutions using Prometheus, VictoriaMetrics, Grafana, and ELK/EFK to ensure system health and performance.
Incident Management: Lead root cause analysis (RCA), post-mortems, and continuous improvement initiatives to enhance system reliability.
Reliability Engineering: Define and implement SRE best practices, including SLAs, SLOs, and error budgets.
Logging & Alerting: Build and maintain logging, alerting, and tracing systems for proactive issue detection and rapid troubleshooting.
Security & Compliance: Enforce security best practices and compliance standards across CI/CD pipelines and runtime environments; support audit readiness.
Collaboration: Work cross-functionally with engineering, product, and infrastructure teams to align platform capabilities with business needs.
Mentorship: Provide guidance and mentorship to junior engineers and contribute to knowledge sharing across teams.
On-call Support: Participate in on-call rotations to support critical platform services.

Qualifications

Bachelor’s or Master’s degree in Computer Science, Engineering, or a related technical field.
5+ years of experience in DevOps, Site Reliability Engineering, or platform engineering roles in production environments.
Proven experience managing Kubernetes clusters (e.g., GKE, EKS, AKS, or self-managed).
Hands-on experience with CI/CD tools and automation frameworks.
Strong experience with infrastructure-as-code tools such as Terraform, Helm, or Ansible.
Proficiency in container technologies (Docker, containerd) and orchestration with Kubernetes.
Strong scripting/programming skills (e.g., Python, Bash, Go).
Experience with observability and monitoring stacks (Prometheus, Grafana, ELK/EFK).
Solid understanding of Linux systems, networking concepts, and cloud-native security best practices.

Preferred Skills/Qualifications

Experience supporting AI/ML or HPC workloads in production environments.
Knowledge of GPU resource management, workload schedulers, and performance tuning.
Familiarity with distributed systems and large-scale infrastructure environments.
Experience with incident management frameworks and reliability engineering practices.
Strong collaboration and communication skills across cross-functional teams.

Compensation

The U.S. base salary range for this full-time role is $109,600 to $164,400, with bonus, and benefits on top. Salary ranges are set according to the role, level, and location. The range listed represents the minimum and maximum target salary for new hires across all U.S. locations. Actual pay within this range will depend on factors such as work location, job-related skills, experience, and relevant education or training.

Benefits

With a diverse team of 1,100+ employees from 68 nationalities, we foster an inclusive, innovative, and collaborative environment.
We foster a culture grounded in trust, accountability, and high performance.
Our values include:

Grit – overcoming challenges with resilience and determination.
Passion – striving for excellence in everything we do.
Impact – driving meaningful change and progress.

Our team members thrive in an environment where each contribution matters, and together, we achieve extraordinary results.

Similar Remote Jobs

Senior DevOps Engineer • Lemon.io Lemon.io

Devops Americas Europe Asia Oceania

1wk ago
Apply See more >

Kickstart Your Job Search

⚡ 13,363 remote jobs added this week

You're seeing 0.4% of available roles

Unlock 160,000+ jobs →

Meet JobCopilot: Your Personal Al Job Hunter

Automatically Apply to Remote Jobs

Try it now →

Before You Apply

️

🇺🇸	Be aware of the location restriction for this remote position: USA Only
‼	Beware of scams! When applying for jobs, you should NEVER have to pay anything. Learn more.

Back to Remote jobs > Devops > reliability engineer

Senior Engineer - Site Reliability @Core42 US Services LLC

Devops

Salary usd 109,600 - 1..	Remote Location 🇺🇸 USA Only
Employment Type full-time	Posted 1wk ago

Apply for this position

Unlock 160,000+ Remote Jobs

️

🇺🇸	Be aware of the location restriction for this remote position: USA Only
‼	Beware of scams! When applying for jobs, you should NEVER have to pay anything. Learn more.

Apply for this position

Unlock 160,000+ Remote Jobs

[Hiring] Senior Engineer - Site Reliability @Core42 US Services LLC

Apply to the best remote jobsbefore everyone else

Apply to the best remote jobs
before everyone else