Site Reliability Engineer @HostPapa

DevOps / Sysadmin

Salary competitive sal..	Remote Location Canada
Job Type full-time	Posted 3wks ago

[Hiring] Site Reliability Engineer @HostPapa

3wks ago - HostPapa is hiring a remote Site Reliability Engineer. 💸 Salary: competitive salary 📍Location: Canada

Role Description

This role focuses on CloudBlue, a HostPapa business that powers cloud commerce for many of the world’s largest service providers, including major Telcos, distributors, and MSPs. CloudBlue enables partners to monetize and manage cloud services and subscriptions at scale, combining the agility of a high-growth business with the backing of a global organization.

As the Site Reliability Engineer, you will help ensure the reliability, scalability, and observability of CloudBlue’s multi-tenant SaaS platforms used by service providers worldwide. You will focus on improving system stability and performance through monitoring, high availability, and incident response, while working closely with DevOps, Platform, and Engineering teams to build and operate resilient production systems.

Define and implement SLIs, SLOs, and error budgets for critical CloudBlue services to ensure reliability and performance.
Influence system architecture with a strong focus on reliability, scalability, and operability, designing systems for fault tolerance, graceful degradation, and self-healing.
Reduce operational toil by identifying opportunities for automation and process improvement.
Design and operate CloudBlue’s observability stack across metrics, logs, and traces using tools such as Datadog, Grafana, and Elastic Stack.
Develop actionable alerting strategies and dashboards that provide clear insight into platform and business health.
Design and maintain high-availability architectures, implementing redundancy, failover, and disaster recovery strategies across regions and availability zones.
Conduct capacity planning, load testing, and performance optimization to ensure platform stability and scalability.
Act as a senior responder during production incidents, leading incident coordination, communication, and service restoration.
Own blameless postmortems and drive improvements that reduce incident frequency, MTTR, and customer impact.
Improve reliability of Kubernetes-based platforms through health checks, autoscaling strategies, rollout safety, and resilience testing.
Partner with engineering and DevOps teams to improve deployment safety, rollback strategies, and platform reliability.
Maintain runbooks and operational documentation, and promote SRE best practices across engineering teams.
Support other tasks or projects as assigned to meet team and business needs.

Qualifications

3+ years of experience as an SRE, DevOps Engineer, or Production Engineer, with strong ownership of production systems.
Proven experience operating highly available, enterprise-grade, multi-tenant SaaS platforms.
Hands-on experience with observability and monitoring tools such as Datadog, Grafana, and Elasticsearch/Kibana.
Solid understanding of Linux, networking, and distributed systems fundamentals.
Experience working with containerized environments such as Docker and Kubernetes.
Strong scripting and automation skills using Python and/or Bash.
Experience participating in on-call rotations and incident response in production environments.
Strong written and spoken English.
Experience defining SLIs/SLOs and managing error budgets at scale will be considered a plus.
Exposure to hyperscale or service-provider-grade platforms is an advantage.
Cloud experience, preferably with Azure; experience with AWS and/or GCP will also be valued.
Experience working with hybrid or on-premises integrations is beneficial.
Familiarity with chaos engineering and resilience testing will be considered an asset.

Benefits

Work from anywhere - this is a remote opportunity.
A competitive salary that values you and your unique skill sets.
Career advancement & professional development opportunities to help you reach your full potential.
Flexible work arrangements to support work/life balance.

Similar Remote Jobs

Senior DevOps Engineer • Marketerx Marketerx

DevOps / Sysadmin $130k - $150k USA Only

2wks ago
Apply See more >

Kickstart Your Job Search

⚡ 12,726 remote jobs added this week

You're seeing 0.4% of available roles

Unlock 152,720 jobs →

Meet JobCopilot: Your Personal Al Job Hunter

Automatically Apply to Remote Jobs

Try it now →

Before You Apply

️

	Be aware of the location restriction for this remote position: Canada
‼	Beware of scams! When applying for jobs, you should NEVER have to pay anything. Learn more.

Back to Remote jobs > DevOps / Sysadmin

Site Reliability Engineer @HostPapa

DevOps / Sysadmin

Salary competitive sal..	Remote Location Canada
Job Type full-time	Posted 3wks ago

Apply for this position

Unlock 152,720 Remote Jobs

️

	Be aware of the location restriction for this remote position: Canada
‼	Beware of scams! When applying for jobs, you should NEVER have to pay anything. Learn more.

Apply for this position

Unlock 152,720 Remote Jobs

[Hiring] Site Reliability Engineer @HostPapa

Apply to the best remote jobsbefore everyone else

Apply to the best remote jobs
before everyone else