[Hiring] Principal Platform Engineer @underdefense
Principal Platform Engineer @underdefense
Devops
Salary unspecified
Remote Location
Employment Type full-time
Posted 2d ago

[Hiring] Principal Platform Engineer @underdefense

2d ago - underdefense is hiring a remote Principal Platform Engineer. πŸ’Έ Salary: unspecified πŸ“Location: Ukraine

Role Description

Principal Platform Engineer - Remote. Supporting a production ML platform on Google Cloud.

Experience: 8-10+ years in DevOps with at least 2 of those years operating and maintaining ML production workloads.

Core Responsibilities

  • Infrastructure Management: Design, deploy, and maintain elastic scaling cloud infrastructure (GCP) and containerization tools like Kubernetes for high-performance ML workloads.
  • CI/CD Pipeline Development and maintenance: Build automated pipelines for training, testing, and deploying machine learning models using tools like Jenkins, GitHub Actions, or Airflow.
  • Model Monitoring & Maintenance: Implement observability tools to track model drift, accuracy, latency, and performance degradation in production.
  • Collaboration: Bridge the gap between data engineers, ML engineers, Backend and Frontend engineers to ensure smooth production operation.
  • Deploy tools that empower individual teams to monitor their workloads: ML Observability: Implement comprehensive monitoring for system health (latency/uptime) alongside ML-specific metrics, such as feature drift, prediction accuracy, and data distribution shifts, to ensure long-term model reliability. Non ML workload and production metrics monitoring.
  • Participate in on-call rotation, help manage posture to ensure compliance with standards such as SOC.

Qualifications

  • GCP at depth β€” IAM, org policies, VPC Service Controls, Secret Manager, Artifact Registry, Cloud DNS. Multi-project estate design (admin / apps / data separation across dev / QA / prod).
  • Kubernetes / GKE at depth β€” cluster topology, upgrades, node pools, Kustomize overlays, Helm. In-cluster operators: ArgoCD, ESO, cert-manager, argo-rollouts, cloudnative-pg, external-dns, kubescape.
  • Istio service mesh at depth β€” VirtualServices, Gateways, ingress passthrough, sidecar injection, mTLS (PeerAuthentication), telemetry. Istio-native across every workload.
  • Kong API gateway β€” Kong Operator + Kong Gateway.
  • Terraform + Atlantis β€” module design, state management, multi-hundred-file estate, GitOps TF workflow.
  • In depth familiarity with GitHub and GitHub Actions β€” ArgoCD in production (kustomize-based), release-promotion pipelines, GitHub workflows.
  • Secrets management β€” GCP Secret Manager + External Secrets Operator; familiarity with SOPS or equivalent gitops-on-secrets patterns.
  • Identity / auth β€” Auth0 (Terraform provider), Dex (in-cluster), Google Groups for IAM.
  • Networking + security β€” VPC-SC perimeter design, private GKE, GCP load balancers, in-cluster security scanning, SOC 2 posture, supply-chain hygiene.
  • Data / ML orchestration β€” operating Airflow in production and an ML-serving stack (Triton, vLLM, LiteLLM, MLflow, Opik).
  • Databases β€” Cloud SQL for PostgreSQL (regional HA, private IP, SSL-enforced, Google sql-db TF module); BigQuery (datasets, tables, IAM, scheduled MERGE queries); in-cluster PostgreSQL via cloudnative-pg; Elasticsearch in-cluster; ClickHouse (external) a plus; GCS as object / model store.
  • Automation / bootstrap β€” Ansible (cluster bootstrap and recovery).
  • Scripting β€” Python, Bash.

Requirements

  • Past experience with Continuous Monitoring of Model Accuracy.
  • Experience with Detecting Data Drift and Concept Drift.
  • Experience Setting Up Alerts for Anomalies or Performance Drops.
  • Experience Logging and Auditing Predictions.
  • Kubernetes certification (CKA / CKAD / CKS), GCP Professional Cloud Architect or Security Engineer certification, ClickHouse ops, Loki.
Before You Apply
️
remote Be aware of the location restriction for this remote position: Ukraine
β€Ό Beware of scams! When applying for jobs, you should NEVER have to pay anything. Learn more.
Principal Platform Engineer @underdefense
Devops
Salary unspecified
Remote Location
Employment Type full-time
Posted 2d ago
Apply for this position
Did not apply βœ“
Applied βœ“
Sent Follow-Up βœ“
Interview Scheduled βœ“
Interview Completed βœ“
Offer Accepted βœ“
Offer Declined βœ“
Unlock 160,000+ Remote Jobs
️
remote Be aware of the location restriction for this remote position: Ukraine
β€Ό Beware of scams! When applying for jobs, you should NEVER have to pay anything. Learn more.
Apply for this position
Did not apply βœ“
Applied βœ“
Sent Follow-Up βœ“
Interview Scheduled βœ“
Interview Completed βœ“
Offer Accepted βœ“
Offer Declined βœ“
Unlock 160,000+ Remote Jobs
Γ—

Apply to the best remote jobs
before everyone else

Access 160,000+ vetted remote jobs and get daily alerts.

4.9 β˜…β˜…β˜…β˜…β˜… from 500+ reviews
Unlock All Jobs Now

Maybe later