Role Description
We are seeking a skilled DevOps / Cloud Engineer to join the team responsible for managing core network platform - OSS. In this role, you will design, deploy, and operate cloud and hybrid infrastructure, build and maintain CI/CD pipelines, and take ownership of running third-party vendor software in our environment. You will bridge the gap between infrastructure and application operations, ensuring our systems are scalable, secure, highly available, and cost-efficient.
-
Design, deploy, and manage AWS cloud and hybrid infrastructure solutions using Infrastructure as Code (IaC) tools.
-
Deploy, operate, and maintain cloud environments across multi-account and multi-region AWS architectures.
-
Deploy, configure, and operate vendor-supplied software within our cloud/hybrid environment, serving as the operational owner for these applications.
-
Coordinate with vendors on installation, upgrades, patching, and configuration changes, translating vendor requirements into infrastructure and deployment solutions.
-
Ensure vendor applications meet our availability, performance, and security standards through own monitoring and incident management processes.
-
Own the security, availability, and reliability of infrastructure, applying best practices for IAM, encryption, and vulnerability management.
-
Build, maintain, and improve CI/CD pipelines that automate testing, security scanning, and deployment workflows.
-
Automate infrastructure provisioning, configuration management, and operational tasks to eliminate manual toil.
-
Implement and maintain monitoring, alerting, and observability tooling to enable proactive issue detection and resolution.
-
Analyze cloud performance metrics and resource utilization to continuously optimize system efficiency and control costs.
-
Partner closely with internal and vendor teams to align on cloud infrastructure deployment and integration practices - bridging code and underlying infrastructure.
-
Provide technical guidance and mentoring to teammates, driving engineering excellence.
Qualifications
-
Bachelorβs degree in Computer Science, Engineering, or a related field β or equivalent professional experience.
-
A minimum of 3+ years of hands-on experience in cloud infrastructure, DevOps, or site reliability engineering roles.
-
Public Cloud Expertise (AWS, Azure or GCP): Proven hands-on experience with public cloud services including compute, networking, databases, K8s, IAM, monitoring.
-
Infrastructure as Code: Proficiency in Terraform or AWS CDK for building and managing infrastructure at scale.
-
CI/CD: Demonstrated experience designing and maintaining CI/CD pipelines (e.g., GitHub Actions, GitLab CI, Jenkins, ArgoCD).
-
Containerization & Orchestration: Solid experience with Docker and K8s for application deployment and management.
-
Hybrid & Multi-Environment Operations: Experience operating workloads across cloud and on-premises or hybrid environments.
-
Third-Party Software Operations: Experience deploying and managing vendor-provided applications in cloud environments, including coordination of upgrades, patching, and configuration.
-
Databases: Hands-on experience with SQL (e.g., PostgreSQL, MySQL/RDS) and NoSQL (e.g., DynamoDB, Redis/Elasticache) databases.
-
Messaging & Queuing: Familiarity with message-driven systems such as SQS, RabbitMQ or Kafka.
-
Scripting & Automation: Proficiency in at least one scripting language (Python, Bash, or similar) for automation tasks.
Requirements
-
AWS or Azure/GCP certification (e.g., AWS Certified DevOps Engineer, Solutions Architect).
-
Skills in remote debugging and troubleshooting distributed systems.
-
Familiarity with security and compliance frameworks (e.g., SOC 2, ISO 27001).
-
Experience with observability platforms (e.g. Prometheus/Grafana, ELK).
-
English proficiency at B2 level or above; able to collaborate effectively with global, cross-functional teams.
-
Experience in software development is a plus.
-
Background in telecom, satellite, or other high-availability, mission-critical environments is a plus.
Soft Skills
-
Strong problem-solving mindset with a bias toward automation and operational efficiency.
-
Collaborative and communicative β comfortable working in a globally distributed team.
-
Ownership mentality - take responsibility for end-to-end reliability of systems under your care.
-
Adaptable and self-directed, with the ability to manage competing priorities in a fast-paced environment.
-
Meticulous attention to detail in documentation, change management, and operational procedures.
Technology Stack
-
Cloud - AWS (EC2, EKS, RDS, DynamoDB, Elasticache, S3, Route53, VPC networking, IAM, CloudWatch).
-
IaC - Terraform, AWS CDK.
-
Containers & Orchestration - Docker, K8s.
-
CI/CD - GitHub Actions, GitLab CI, Jenkins, ArgoCD (or similar).
-
Messaging - SQS, Kafka.
-
Scripting - Python, Bash.
-
Databases - PostgreSQL, MySQL, DynamoDB, Redis.
-
Monitoring & Observability - Prometheus, Grafana, CloudWatch.
-
Version Control - Git.
Physical Requirements
-
Ability to work in a standard office or remote home-office environment and use a computer for extended periods.
-
Ability to participate in occasional after-hours incident response actions.