Role Description
We're looking for an experienced Senior DevOps Engineer to join our infrastructure and platform team. You'll join a small, collaborative infrastructure team and take shared ownership of the systems that keep our high-traffic, consumer-facing platform running reliably, securely, and efficiently.
This is not a role for someone who wants to impose a new vision or rebuild from scratch. We have established patterns, tooling, and conventions that workβand we need someone who can learn them quickly, operate confidently within them, and over time contribute thoughtful improvements that make the whole team better. You'll bring a strong technical foundation, a practical mindset, and the communication skills to collaborate effectively across engineering, product, and leadership.
What You'll Do
-
Infrastructure Provisioning & Management
-
Design, implement, and maintain cloud infrastructure on AWS using Terraform and Ansible, following existing conventions and extending them thoughtfully.
-
Manage and support AWS services across our stack including EC2, ECS, RDS, S3, IAM, VPC, CloudFront, and related services.
-
Maintain and improve infrastructure-as-code practices, ensuring consistency, reproducibility, and auditability across environments.
-
Participate in capacity planning and cost optimization, identifying opportunities to improve resource efficiency without compromising reliability.
-
CI/CD & Deployment
-
Build, maintain, and improve CI/CD pipelines (GitHub Actions or equivalent) to support reliable, automated delivery across development, staging, and production environments.
-
Work with engineering teams to improve build speed, deployment safety, and rollback capabilities.
-
Support blue/green and canary deployment strategies as appropriate for our platform needs.
-
Reliability & Incident Response
-
Participate in on-call rotation and own production incidents end-to-end β from detection through root cause analysis, resolution, and post-mortem.
-
Use observability tooling (Datadog, CloudWatch, or equivalent) to monitor system health, establish alerting thresholds, and proactively surface issues before they impact customers.
-
Contribute to runbooks, incident documentation, and process improvements that reduce mean time to resolution over time.
-
Security & Compliance
-
Apply security best practices across infrastructure β IAM policy scoping, secrets management, network segmentation, vulnerability patching, and access controls.
-
Support compliance and audit requirements by maintaining clear documentation and ensuring infrastructure changes are tracked and reviewable.
-
Collaboration & Continuous Improvement
-
Work closely with the senior engineer on the team to learn existing systems deeply and contribute to architectural improvements over time.
-
Proactively identify areas for improvement β tooling, automation gaps, manual processes, reliability risks β and raise them constructively with the team.
-
Document infrastructure clearly so that other engineers can understand and operate the systems they depend on.
Qualifications
-
8+ years of professional DevOps, infrastructure, or platform engineering experience in production environments.
-
Hands-on proficiency with Terraform for infrastructure provisioning β writing modules, managing state, and working across environments.
-
Deep familiarity with AWS β including compute (EC2, ECS), storage (S3, RDS), networking (VPC, Route 53, CloudFront), and IAM.
-
Experience with Ansible for configuration management and automation across server fleets or container environments.
-
Strong understanding of CI/CD principles and hands-on experience building or maintaining pipelines (GitHub Actions, GitLab CI, CircleCI, or equivalent).
-
Experience with Linux system administration, shell scripting (Bash), and general infrastructure debugging.
-
Demonstrated ability to work within an established infrastructure β understanding existing design decisions, following conventions, and improving incrementally rather than replacing wholesale.
-
Solid grasp of security fundamentals: IAM least-privilege, secrets management, network access controls, and patching hygiene.
-
Strong written and verbal communication skills in English β able to collaborate asynchronously across time zones and document work clearly.
-
BSc in Computer Science, Engineering, or a related field β or equivalent professional experience.
-
Must be comfortable working in 2 PM to 11 PM IST.
Nice-to-Haves
-
Experience with container orchestration and managed Kubernetes services, particularly in cloud environments.
-
Familiarity with observability tooling such as Datadog, Prometheus, Grafana, or New Relic.
-
Experience with database operations on AWS RDS β backups, replication, failover, and performance tuning.
-
Familiarity with Redis, Sidekiq, or background job infrastructure in production environments.
-
Exposure to AI/ML infrastructure or familiarity with deploying and serving ML models β not required today, but increasingly relevant as our platform evolves.
-
Experience with cost allocation, tagging strategies, and AWS cost optimization practices.
-
Background in Agile or Scrum environments.
-
Familiarity with Microsoft Azure, including core compute, networking, and deployment services.
-
Experience with cloud cost management and FinOps practices β including tagging strategies, cost allocation, rightsizing, Reserved Instance or Savings Plan management, and working with tools like AWS Cost Explorer or equivalent.