[Hiring] SRE Engineer @Darwoft
SRE Engineer @Darwoft
Software Development
Salary unspecified
Remote Location
Employment Type full-time
Posted 1wk ago

[Hiring] SRE Engineer @Darwoft

1wk ago - Darwoft is hiring a remote SRE Engineer. πŸ’Έ Salary: unspecified πŸ“Location: Worldwide

Role Description

We are seeking a Senior Site Reliability Engineer with a deep focus on observability and AI platform operations. This role sits at the intersection of reliability engineering and emerging AI infrastructure. You will own the instrumentation, visibility, and operational health of AI-powered systems, including LLM API gateways, token usage pipelines, and model serving infrastructure.

You will act as the authority on runtime behavior across our AI stack, building the tooling and insights required to understand, measure, and optimize system performance, reliability, and cost.

Responsibilities

  • Design and operate AI gateway infrastructure, including routing, rate limiting, and traffic shaping for LLM API traffic.
  • Build and maintain deep observability into AI workloads: token consumption, model latency, cost attribution, and error rates by model, team, and use case.
  • Define and track SLIs, SLOs, and error budgets for AI services and API-dependent workflows.
  • Instrument LLM-backed applications to surface prompt/completion telemetry, retry patterns, and quota burn rates.
  • Develop dashboards and alerting using Grafana, Loki, and Prometheus tailored to AI traffic patterns (beyond traditional infrastructure metrics).
  • Maintain and evolve observability pipelines capable of handling high-cardinality AI metadata.
  • Lead incident response for AI platform degradations, including model unavailability, gateway saturation, and upstream provider outages.
  • Automate operational workflows across AI infrastructure using Infrastructure as Code (IaC) and CI/CD practices.
  • Collaborate closely with ML/AI engineering teams to embed reliability and cost-visibility practices early in the development lifecycle.

Qualifications

  • Strong experience with AWS cloud services, specifically those relevant to AI workloads (Bedrock, SageMaker, Lambda, API Gateway).
  • Hands-on expertise with Kubernetes in production environments.
  • Proven experience building and operating observability stacks (Prometheus, Grafana, Loki), with an emphasis on application- and API-layer metrics.
  • Solid understanding of API gateway patterns, including routing, throttling, authentication, and traffic observability.
  • Experience instrumenting and monitoring LLM or AI API usage (token budgets, cost tracking, latency profiling).
  • Proficiency in Python, Go, or Bash for automation and tooling.
  • Mastery of Infrastructure as Code (Terraform) and CI/CD pipelines.
  • Strong analytical mindset with the ability to extract signal from high-cardinality telemetry.

Requirements

  • Experience operating or integrating AI gateway solutions (e.g., Kong AI Gateway, Portkey, LiteLLM).
  • Familiarity with OpenTelemetry and distributed tracing for AI/ML workloads.
  • Experience with FinOps practices for AI, including chargeback models and cost anomaly detection.
  • Knowledge of service mesh technologies and their role in AI traffic management.

Benefits

  • Contractor agreement with payment in USD.
  • 100% remote work in an international environment.
  • Access to Argentine public holidays.
  • Professional development in the cutting-edge field of AI Platform Engineering.
  • Referral program and access to learning platforms.
  • English classes to further enhance professional communication.
Before You Apply
️
worldwide Be aware of the location restriction for this remote position: Worldwide
β€Ό Beware of scams! When applying for jobs, you should NEVER have to pay anything. Learn more.
SRE Engineer @Darwoft
Software Development
Salary unspecified
Remote Location
Employment Type full-time
Posted 1wk ago
Apply for this position
Did not apply βœ“
Applied βœ“
Sent Follow-Up βœ“
Interview Scheduled βœ“
Interview Completed βœ“
Offer Accepted βœ“
Offer Declined βœ“
Unlock 160,000+ Remote Jobs
️
worldwide Be aware of the location restriction for this remote position: Worldwide
β€Ό Beware of scams! When applying for jobs, you should NEVER have to pay anything. Learn more.
Apply for this position
Did not apply βœ“
Applied βœ“
Sent Follow-Up βœ“
Interview Scheduled βœ“
Interview Completed βœ“
Offer Accepted βœ“
Offer Declined βœ“
Unlock 160,000+ Remote Jobs
Γ—

Apply to the best remote jobs
before everyone else

Access 160,000+ vetted remote jobs and get daily alerts.

4.9 β˜…β˜…β˜…β˜…β˜… from 500+ reviews
Unlock All Jobs Now

Maybe later