[Hiring] Staff Site Reliability & DevOps Engineer - Observability @Brandwatch
Staff Site Reliability & DevOps Engineer - Observability @Brandwatch
Devops
Salary unspecified
Remote Location
Employment Type full-time
Posted 3wks ago

[Hiring] Staff Site Reliability & DevOps Engineer - Observability @Brandwatch

3wks ago - Brandwatch is hiring a remote Staff Site Reliability & DevOps Engineer - Observability. πŸ’Έ Salary: unspecified πŸ“Location: Bulgaria

Role Description

This role focuses on designing, operating, and evolving observability platforms with a strong emphasis on metrics, logging, and alerting. The primary tooling is Grafana and Prometheus, with responsibility for ensuring production systems are observable, reliable, and operable at scale. The role works closely with platform, infrastructure, and application teams.

  • Design, build, and operate observability platforms based on Grafana and Prometheus
  • Define and maintain metrics standards, dashboards, alerts, and SLOs
  • Improve signal quality: reduce alert noise, tune thresholds, and improve runbooks
  • Support incident response by providing actionable telemetry and post-incident analysis
  • Integrate metrics, logs, and traces across distributed systems
  • Work with engineering teams to instrument services correctly
  • Automate observability configuration using infrastructure as code
  • Contribute to reliability improvements through capacity planning and performance analysis

Qualifications

  • Strong experience with Prometheus (scraping, federation, recording rules, alerting)
  • Strong experience with Grafana (dashboards, alerting, templating, RBAC)
  • Solid Linux and networking fundamentals
  • Experience running observability stacks in Kubernetes environments
  • Infrastructure as code experience (Terraform preferred)
  • Familiarity with incident management and on-call practices
  • Ability to debug production systems using metrics and logs

Requirements

  • Experience with logs and traces (e.g. Loki, Tempo, OpenTelemetry)
  • Experience operating large-scale or multi-cluster Kubernetes platforms
  • Experience with cloud platforms (GCP, AWS, OCI)
  • Exposure to SRE concepts such as error budgets and SLO-driven prioritisation

Benefits

  • Engineers trust dashboards and alerts to reflect system health
  • Incidents are detected earlier and diagnosed faster
  • Alert fatigue is reduced and on-call quality improves
  • Observability is treated as a first-class platform capability
Before You Apply
️
remote Be aware of the location restriction for this remote position: Bulgaria
β€Ό Beware of scams! When applying for jobs, you should NEVER have to pay anything. Learn more.
Staff Site Reliability & DevOps Engineer - Observability @Brandwatch
Devops
Salary unspecified
Remote Location
Employment Type full-time
Posted 3wks ago
Apply for this position
Did not apply βœ“
Applied βœ“
Sent Follow-Up βœ“
Interview Scheduled βœ“
Interview Completed βœ“
Offer Accepted βœ“
Offer Declined βœ“
Unlock 150,000+ Remote Jobs
️
remote Be aware of the location restriction for this remote position: Bulgaria
β€Ό Beware of scams! When applying for jobs, you should NEVER have to pay anything. Learn more.
Apply for this position
Did not apply βœ“
Applied βœ“
Sent Follow-Up βœ“
Interview Scheduled βœ“
Interview Completed βœ“
Offer Accepted βœ“
Offer Declined βœ“
Unlock 150,000+ Remote Jobs
Γ—

Apply to the best remote jobs
before everyone else

Access 150,000+ vetted remote jobs and get daily alerts.

4.9 β˜…β˜…β˜…β˜…β˜… from 500+ reviews
Unlock All Jobs Now

Maybe later