Back to Remote jobs > Software Development

Staff Software Engineer - Grafana Databases, Managed Services @Grafana Labs

Software Development

Salary gbp 103,958 - 1..	Remote Location European timezones, GMT (UTC+0), UTC-2, CAT (UTC-1), CET +/- 3 HOURS, GMT to GMT+4
Employment Type full-time	Posted 4wks ago

[Hiring] Staff Software Engineer - Grafana Databases, Managed Services @Grafana Labs

4wks ago - Grafana Labs is hiring a remote Staff Software Engineer - Grafana Databases, Managed Services. 💸 Salary: gbp 103,958 - 124,750 per year 📍Location: European timezones, GMT (UTC+0), UTC-2, CAT (UTC-1), CET +/- 3 HOURS, GMT to GMT+4

Role Description

The Managed Services team is a newly formed squad within the Databases department. It owns and operates shared, production-critical infrastructure that powers Grafana Cloud’s next-generation database products (Mimir, Loki, and Tempo). Today, this includes operating 100+ WarpStream clusters across multiple cloud providers and regions, with continued growth anticipated for the future. WarpStream acts as the streaming backbone for ingestion and read/write decoupling across databases. It sits directly on the hot path for metrics, logs, and traces, handling high-throughput, multi-consumer workloads at massive scale.

In addition to streaming infrastructure, the team works closely with high-volume analytical and storage systems that power query-heavy and aggregation-heavy workloads, where latency, compression behavior, storage layout, and scaling characteristics matter deeply.

What You’ll Be Doing

Operate and evolve 100+ multi-cloud streaming clusters and related database infrastructure.
Diagnose and eliminate cross-layer failure modes (e.g., object storage latency, noisy neighbors, control-plane bottlenecks, query performance regressions, etc.).
Design safe upgrade and rollout strategies at scale.
Improve observability, automation, and operational ergonomics.
Partner closely with database and platform teams to ensure safe scaling, partitioning, consumer fan-out, and query performance.
Work directly with distributed systems behavior, Kubernetes scheduling dynamics, storage engines, compression trade-offs, etc.
Serve as a primary escalation point and on-call for relevant incidents.
Own the relationship with all system vendors, including WarpStream Labs and others.

At the Staff level

Help define and evolve the technical direction for operating WarpStream and adjacent shared database systems at scale.
Lead complex initiatives such as migrations, rollout improvements, and reliability investments.
Establish best practices around SLOs, scaling limits, failure isolation, and change safety.
Investigate and drive resolution of multi-layer incidents spanning storage, compute, networking, and control-plane dependencies.
Identify systemic risks across 100+ clusters and contribute architectural improvements that reduce recurring issues.
Improve systems toil and operational ergonomics with automation.
Partner with database and platform teams to align on strategy and long-term scalability.
Mentor and support engineers as the team matures.

What Makes You a Great Fit

Regular 1:1s with your manager and close collaboration with teammates across regions.
Defining and evolving SLO strategy for shared database infrastructure.
Setting standards for diagnosability across core streaming and database systems in production.
Leading complex initiatives across high-throughput, multi-cloud infrastructure.
Designing and promoting fault-tolerant architectural patterns.
Defining rollout, migration, and upgrade safety practices.
Partnering with database and platform engineering leaders.
Leading design discussions and reviewing PRs.
Raising the bar for practices across teams by mentoring engineers.
Playing a key role in high-impact incident response.

Requirements

8+ years of engineering experience, including meaningful time in SRE, platform engineering, production engineering, infrastructure engineering, or distributed systems roles.
Experience with high-throughput streaming systems, analytical or storage backends, or large-scale database infrastructure.
Strong Kubernetes experience in AWS, GCP, or Azure, and familiarity with infrastructure-as-code tooling (Helm, Terraform, Jsonnet, etc.).
Experience leading or driving complex technical efforts.
Ability to influence technical direction and align teams around reliability improvements.
Strong understanding of distributed systems failure modes in multi-cloud environments.
Proficiency in at least one systems-oriented language (Go preferred, but not required).
Working knowledge of Linux internals, networking, cloud storage, and performance/scaling behavior.
Experience participating in blameless incident response and writing high-quality post-incident reviews.
Clear communicator who can collaborate across teams and work autonomously.
Intellectually curious, transparent, action-oriented, and kind.

Compensation & Rewards

In the United Kingdom, the Base compensation range for this role is GBP 103,958 - GBP 124,750.
Actual compensation may vary based on level, experience, and skillset as assessed in the interview process.
Benefits include equity, bonus (if applicable) and other benefits.
All roles include Restricted Stock Units (RSUs).

Why You’ll Thrive at Grafana Labs

100% Remote, Global Culture.
Scaling Organization – Tackle meaningful work in a high-growth, ever-evolving environment.
Transparent Communication – Expect open decision-making and regular company-wide updates.
Innovation-Driven – Autonomy and support to ship great work and try new things.
Open Source Roots – Built on community-driven values that shape how we work.
Empowered Teams – High trust, low ego culture that values outcomes over optics.
Career Growth Pathways – Defined opportunities to grow and develop your career.
Approachable Leadership – Transparent execs who are involved, visible, and human.
Passionate People – Join a team of smart, supportive folks who care deeply about what they do.
In-Person onboarding - We want you to thrive from day 1.
Balance is Key - We operate a global annual leave policy of 30 days per annum.

Equal Opportunity Employer

We will recruit, train, compensate and promote regardless of race, religion, color, national origin, gender, disability, age, veteran status, and all the other fascinating characteristics that make us different and unique.

Similar Remote Jobs

Senior Independent AI Engineer / Architect • A.Team A.Team

Software Development $120 - $170 /ho.. Americas Europe Israel

Featured
Apply See more >
Senior Independent Software Developer • A.Team A.Team

Software Development $90 - $150 /hou.. Americas Europe Israel

Featured
Apply See more >
Head of Engineering • Lemon.io Lemon.io

Software Development USA timezones European timezones

2wks ago
Apply See more >

Kickstart Your Job Search

⚡ 13,410 remote jobs added this week

You're seeing 0.4% of available roles

Unlock 160,000+ jobs →

Meet JobCopilot: Your Personal Al Job Hunter

Automatically Apply to Remote Jobs

Try it now →

Before You Apply

️

	Be aware of the location restriction for this remote position: European timezones, GMT (UTC+0), UTC-2, CAT (UTC-1), CET +/- 3 HOURS, GMT to GMT+4
‼	Beware of scams! When applying for jobs, you should NEVER have to pay anything. Learn more.

Back to Remote jobs > Software Development

Staff Software Engineer - Grafana Databases, Managed Services @Grafana Labs

Software Development

Salary gbp 103,958 - 1..	Remote Location European timezones, GMT (UTC+0), UTC-2, CAT (UTC-1), CET +/- 3 HOURS, GMT to GMT+4
Employment Type full-time	Posted 4wks ago

Apply for this position

Unlock 160,000+ Remote Jobs

️

	Be aware of the location restriction for this remote position: European timezones, GMT (UTC+0), UTC-2, CAT (UTC-1), CET +/- 3 HOURS, GMT to GMT+4
‼	Beware of scams! When applying for jobs, you should NEVER have to pay anything. Learn more.

Apply for this position

Unlock 160,000+ Remote Jobs

[Hiring] Staff Software Engineer - Grafana Databases, Managed Services @Grafana Labs

Apply to the best remote jobsbefore everyone else

Apply to the best remote jobs
before everyone else