Role Description
Muttdata is looking for a hands-on DevOps to join a strategic initiative focused on deploying and operating Data & AI platforms. This role is centered around infrastructure automation, platform reliability, Kubernetes operations, and production-grade cloud infrastructure on AWS. The person will work closely with Infrastructure and Enterprise Architecture teams, operating within established corporate security, networking, and compliance standards.
Responsibilities
-
Infrastructure as Code on AWS
-
Deploy and maintain infrastructure using Terraform on AWS.
-
Work within the organizationβs corporate golden path leveraging HCP (HashiCorp Cloud Platform) and HashiCorp Vault.
-
Ensure infrastructure complies with enterprise security, networking, and governance standards.
-
Collaborate with Infrastructure and Enterprise Architecture teams on platform requirements and integrations.
-
Administration & Operations of Data & AI Platforms
-
Operate and govern production-grade platforms running on Kubernetes / EKS.
-
Manage platforms such as Langfuse, LiteLLM, ClickHouse, Redis, and future Data & AI tooling.
-
Design and maintain:
-
Backup & restore strategies
-
High Availability (HA) configurations
-
Shared cache and distributed rate limiting mechanisms
-
Horizontal and vertical scaling strategies
-
Platform upgrades and dependency management
-
Integrate secrets and credentials management through Vault.
-
Troubleshoot production incidents and improve operational recovery processes.
-
Read and understand platform documentation to ensure optimal deployment and operation on Kubernetes environments.
-
DevOps / SRE Automation
-
Build and maintain CI/CD pipelines using GitHub Actions.
-
Automate operational workflows and deployment processes.
-
Improve observability, monitoring, and operational reliability.
-
Create operational runbooks and reduce manual toil through automation.
-
Continuously improve platform stability, scalability, and developer experience.
Qualifications
-
Must-have
-
Strong Terraform experience (production-level, not tutorial-based)
-
Solid AWS infrastructure experience
-
Kubernetes / EKS administration and operations
-
Containers and cloud-native infrastructure
-
SRE mindset and operational judgment
-
Ability to understand systems under the hood, not only operate tooling
-
Nice-to-have
-
Python for automation
-
Database administration experience
-
ClickHouse
-
Redis
-
LiteLLM
-
Langfuse
-
Observability tools such as Prometheus, Grafana, or equivalent
Benefits
-
In-Company English Lessons.
-
Wellhub or sports club stipend to stay active.
-
AWS & Databricks certifications fully covered.
-
Food credits via Pedidos Ya β because great work deserves great food.
-
Birthday off + an extra vacation week (Mutt Week!).
-
Referral bonuses β help us grow the team & get rewarded!
-
Annual Mutters' Trip β an unforgettable getaway with the team!