Site Reliability Engineer/Developer @TechInsights
DevOps / Sysadmin
Salary $109,600 – 116,..
Remote Location
Job Type full-time
Posted 4d ago

[Hiring] Site Reliability Engineer/Developer @TechInsights

4d ago - TechInsights is hiring a remote Site Reliability Engineer/Developer. πŸ’Έ Salary: $109,600 – 116,100 cad πŸ“Location: Canada

Role Description

The Site Reliability Developer is responsible for designing, implementing, and maintaining the reliable, scalable cloud infrastructure that powers TechInsights' semiconductor intelligence applications. This role sits at the intersection of software engineering and systems operations β€” building automation tools, establishing infrastructure patterns, and ensuring production environments consistently meet availability and performance standards across a multi-region AWS environment.

Working within the cloud operations team, the Site Reliability Developer brings advanced technical expertise to complex infrastructure challenges, applies site reliability engineering best practices, and partners with development teams to enable efficient, reliable software delivery at scale. This is a role for an engineer who can independently drive solutions to complex problems with a meaningful impact on operational and service-delivery outcomes.

The ideal candidate brings exceptional depth and breadth in reliability engineering and cloud infrastructure. They are equally adept at interpreting business and technical challenges, recommending improvements to infrastructure and processes, and delivering innovative solutions that raise the bar for operational excellence across the organization.

Qualifications

  • Bachelor's degree in Computer Science, Engineering, or related field, or equivalent experience
  • 5–7 years in Site Reliability Engineering, DevOps, or cloud operations
  • Strong AWS expertise (EC2, ECS/EKS, RDS, S3, Lambda, VPC) and hybrid cloud environments
  • Proficiency in Python, Go, or Java; experience with Docker, Kubernetes, and container orchestration
  • Expertise in infrastructure-as-code (Terraform, Ansible, CloudFormation) and CI/CD pipeline development
  • Experience with observability tools (Prometheus, Grafana, DataDog, CloudWatch, PagerDuty)
  • Solid foundation in Linux/Unix administration, networking, security, and database systems

Requirements

  • Design, implement, and maintain highly available, scalable infrastructure systems across multi-region AWS deployments, ensuring production environments consistently meet availability and performance requirements.
  • Develop and maintain service level objectives (SLOs) and service level indicators (SLIs) in collaboration with development teams, using metrics to quantify and continuously improve system reliability.
  • Monitor system performance, availability, and resource utilization using CloudWatch, DataDog, and Prometheus, proactively identifying optimization opportunities and conducting root cause analysis for outages and degradations.
  • Implement capacity planning strategies using historical data analysis and growth projections to ensure infrastructure scales ahead of demand, balanced against cost optimization using AWS Cost Explorer and Kubecost.
  • Create comprehensive infrastructure-as-code solutions using Terraform and GitOps methodologies to manage AWS resources consistently, securely, and repeatably.
  • Develop and maintain CI/CD pipelines using Jenkins, GitLab CI, or GitHub Actions to automate deployment processes with built-in testing and validation.
  • Implement and maintain containerization platforms using Docker and Kubernetes, establishing standards for container orchestration, cluster management, and reusable infrastructure patterns.
  • Build automation tools and scripts in Python, Go, or Java to eliminate manual operational tasks, reduce toil, and automate routine maintenance procedures including patching, backups, and resource cleanup.
  • Lead incident response for critical system outages and performance issues, coordinating cross-functional teams to diagnose and resolve problems with speed and precision.
  • Implement comprehensive observability solutions β€” including logging, monitoring, distributed tracing, and intelligent alerting via Grafana and PagerDuty β€” that ensure rapid response to genuine issues while minimizing alert fatigue.
  • Conduct blameless post-mortems and thorough post-incident reviews, documenting lessons learned and driving implementation of preventive measures and updated runbooks.
  • Develop and maintain disaster recovery procedures and business continuity plans, including regular testing, and collaborate with Security and Compliance teams to ensure monitoring systems meet audit and regulatory requirements.

Benefits

  • Company-sponsored training and development opportunities
  • Comprehensive benefits package (health, dental, vision, wellness, RRSP Matching, annual fitness reimbursement)
  • Flexible vacation policy
  • Bring your own device program
  • Community involvement opportunities through charitable alliances
  • Wellness resources and support
  • Inclusive environment that prioritizes diversity, equity, and accessibility
  • High-growth company driven by high performance
  • Expected salary range: $109,600 – 116,100 CAD

Working Arrangement

  • This is a remote position for candidates based in Canada
  • Occasional travel may be required
Before You Apply
️
remote Be aware of the location restriction for this remote position: Canada
β€Ό Beware of scams! When applying for jobs, you should NEVER have to pay anything. Learn more.
Site Reliability Engineer/Developer @TechInsights
DevOps / Sysadmin
Salary $109,600 – 116,..
Remote Location
Job Type full-time
Posted 4d ago
Apply for this position
Did not apply βœ“
Applied βœ“
Sent Follow-Up βœ“
Interview Scheduled βœ“
Interview Completed βœ“
Offer Accepted βœ“
Offer Declined βœ“
Unlock 152,720 Remote Jobs
️
remote Be aware of the location restriction for this remote position: Canada
β€Ό Beware of scams! When applying for jobs, you should NEVER have to pay anything. Learn more.
Apply for this position
Did not apply βœ“
Applied βœ“
Sent Follow-Up βœ“
Interview Scheduled βœ“
Interview Completed βœ“
Offer Accepted βœ“
Offer Declined βœ“
Unlock 152,720 Remote Jobs
Γ—

Apply to the best remote jobs
before everyone else

Access 152,720+ vetted remote jobs and get daily alerts.

4.9 β˜…β˜…β˜…β˜…β˜… from 500+ reviews
Unlock All Jobs Now

Maybe later