Role Description
We're looking for a DevOps Engineer to own our production and developer infrastructure end-to-end. You'll be the person who makes sure deployments are smooth, environments are reliable, and developers can move fast without breaking things. You will have real hands-on ownership over how we build, ship, and run software.
What You'll Own
-
Production Infrastructure:
ECS Fargate clusters, ALB, CloudFront CDN, Route 53 DNS, VPC networking (public/private subnets, NAT gateways, VPC peering across accounts)
-
Infrastructure as Code:
Terraform modules managed via Terragrunt across environments
-
Bedrock / LLM services:
Significant agent-driven workflows, ASR/TTS, internal and client-facing agents, etc.
-
CI/CD Pipelines:
GitHub Actions workflows for build, test, deploy, and promotion β including automated staging/production promotion via Slack triggers
-
Database Operations:
DynamoDB (GSIs, streams) and PostgreSQL on RDS
-
Serverless:
Lambda functions for document processing, email handling, SMS webhooks, DynamoDB stream processing, etc.
-
Messaging:
SQS FIFO queues, SNS topics for event-driven architecture
-
Monitoring & Alerting:
CloudWatch dashboards, alarms, structured logging (Pino), health checks
-
Developer Experience:
LocalStack-based local development environment, Docker Compose stacks, Makefile automation, developer sandbox provisioning on real AWS
-
Security:
IAM policies, KMS encryption, Secrets Manager rotation, ACM certificates, security groups, SOC2 compliance logging
Qualifications
-
4β6+ years of hands-on DevOps/infrastructure/SRE experience
-
Deep AWS expertise: ECS/Fargate, DynamoDB, RDS, Lambda, SQS/SNS, S3, CloudFront, VPC networking, IAM, Secrets Manager, KMS
-
Terraform proficiency: You've written and maintained real Terraform modules in production, not just applied example configs. Terragrunt experience is a plus.
-
CI/CD ownership: GitHub Actions (or similar) β you've built and maintained deployment pipelines, not just used them
-
Container experience: Docker multi-stage builds, ECS task definitions, container orchestration
-
Node.js/TypeScript familiarity: You don't need to be a Node expert, but you should be comfortable debugging build issues, optimizing Docker images, and understanding the runtime.
-
Monitoring mindset: You set up alerting before things break, not after. Experience with CloudWatch, structured logging, and incident response.
-
Security awareness: IAM least-privilege, secrets rotation, network isolation, encryption at rest and in transit
Nice to Have
-
Experience with LocalStack for local AWS emulation
-
PostgreSQL administration (RDS multi-AZ, automated backups, secret rotation)
-
Serverless architecture (Lambda, event-driven patterns)
-
SOC2 compliance experience
-
Experience at a small company where you wore multiple hats
-
Cost optimization on AWS (you've actually looked at a bill and done something about it)
Soft Skills
-
Is a strong team player β you have the skills to communicate your vision to teammates and support others in those pursuits
-
Is a capable strategic partner β you are quick to comprehend business and product context, allowing you to contribute to what weβre doing and why, with the ability to recognize and fill in the gaps as needed
-
Is highly self-motivated and can own projects end-to-end
-
Has the ability to write thorough, scalable and clear documentation
-
Says "hello cassi!" in a cover letter.
-
Attention to detail - proofs and reviews any AI generated content
-
Is inquisitive nature, can dive into inconsistencies and pinpoint issues
Requirements
-
Bachelor's degree in Computer Science/related field or commensurate experience