[Hiring] SRE Platform Engineer @GE Vernova
SRE Platform Engineer @GE Vernova
Devops
Salary unspecified
Remote Location
Employment Type full-time
Posted 3wks ago

[Hiring] SRE Platform Engineer @GE Vernova

3wks ago - GE Vernova is hiring a remote SRE Platform Engineer. 💸 Salary: unspecified 📍Location: Worldwide

Role Description

The Platform System Reliability Engineer is the primary operations engineer and operator of our EKS Kubernetes environment, which serves as the foundation for our global grid software SaaS products. This role focuses on the "middle-mile" of software delivery, ensuring that the underlying compute, networking, and storage layers are secure, hardened, scalable, and resilient to support critical energy infrastructure in the cloud. You will be responsible for the full lifecycle of production clusters, from initial bootstrapping, performance tuning, patching and securing.

Qualifications

  • Bachelor's Degree in Computer Science or “STEM” Majors (Science, Technology, Engineering and Math) with advanced experience.
  • 6–8 years in SRE or Platform Engineering roles supporting mission-critical, 24/7 cloud environments.

Requirements

  • 5 years of experience operating production-grade Kubernetes clusters at scale.
  • Expert-level knowledge of multi-cluster management, performance tuning and experience implementing observability tools such as Prometheus/Grafana, Dynatrace, Splunk, Datadog, etc.
  • Deep hands-on experience with AWS core services (EKS, EC2, ALB, S3, RDS, MSK).
  • Proficiency in Terraform, Ansible, and Python or Go for infrastructure automation and deployment tools like ArgoCD or Flux.
  • Strong understanding and hands-on experience of cloud networking concepts such as VPCs, routing, load balancing and security configurations such as encryption, certificate management.

Benefits

  • Relocation Assistance Provided: Yes
  • #LI-Remote - This is a remote position

Roles and Responsibilities

  • Day 0: Provision & Infrastructure Hardening
    • Kubernetes Cluster Orchestration: Help design and deploy hardened EKS clusters across multiple AWS regions, ensuring consistent security baselines.
    • Infrastructure as Code (IaC): Build and maintain reusable Terraform and Ansible modules for automated provisioning of cloud infrastructure services including networking services, compute, storage, queue and cache, etc.
    • Security Architecture: Implement "Policy as Code" guardrails and secure network perimeters (ESPs) in alignment with NERC CIP and IEC 62443 standards.
    • Operationalize Cloud Infrastructure: Standardize run books, operating processes required to run critical infrastructure with highest reliability.
  • Day 1: Platform Readiness & Scaling
    • Resource Governance: Define and enforce Kubernetes resource quotas, limit ranges, and Pod Priority classes to ensure mission-critical services receive prioritized compute resources.
    • Connectivity & Ingress: Manage the ingress strategy and service mesh architecture to facilitate secure, performant connectivity between distributed micro services.
    • Acceptance Testing: Lead platform-level smoke, load testing and disaster recovery exercises to validate that the infrastructure can meet 99.99% uptime targets.
    • Sizing & Optimization: Partner with application teams to right-size containerized workloads, optimizing for both performance and cloud cost (FinOps).
  • Day 2: Operational Excellence & Tier 3 Support
    • L3 Escalation: Act as the highest technical escalation point for complex Kubernetes internals, troubleshooting issues such as failed pods, memory leaks, and network partitions.
    • Incident Response: Lead root cause analysis (RCA) for platform-level outages, implementing systemic fixes to prevent recurring failures.
    • Toil Elimination: Proactively identify and automate repetitive operational tasks—such as cluster upgrades and OS patching—to ensure the team spends at least 50% of their time on engineering improvements.
    • Observability Integration: Institutionalize platform monitoring using Prometheus and Grafana, creating dashboards that surface the "Golden Signals" of cluster health.

Preferred Qualifications

  • Practical knowledge of NERC CIP, SOC2, ISO 27001, or IEC 62443 compliance standards in a SaaS context.
  • AWS Certified DevOps Engineer – Professional, CKA (Certified Kubernetes Administrator), or SRE Practitioner Certification.
  • Experience supporting mission-critical systems in energy, utilities, or other high-stakes industrial sectors.
  • Understand key cross-functional concepts that impact the organization; is aware of business priorities and organizational dynamics.
  • Coach and mentor team members.
  • Familiar with concepts of costing hardware and software components. Works to assure work is on-time and within budget.
  • Deliver tasks on-time with alignment to architectural goals. Can identify and raise issues, risks and benefits.
  • Participate in change initiatives by implementing new directions and providing appropriate information and feedback.

Personal Attributes

  • High level of energy and enthusiasm with the ability to thrive in a rapidly changing environment.
  • Demonstrated customer focus – evaluates decisions through the eyes of the customer; builds strong customer relationships; creates processes with customer viewpoint; partners with customers.
  • Change oriented – actively generates process improvements; champions and drives change initiatives; confronts.
  • Ability to work with global teams, act independently and as part of a team.
  • Open-mindedly to new perspectives or ideas. Consider different or unusual solutions when appropriate.
  • Resolve day-to-day issues related to strategy implementation. Escalate issues that impact the client and/or strategic initiatives.
  • Strong analytical and strong problem solving skills - communicates in a clear and succinct manner and effectively evaluates information/data to make decisions; anticipates obstacles and develops plans to resolve.
Before You Apply
worldwide Be aware of the location restriction for this remote position: Worldwide
Beware of scams! When applying for jobs, you should NEVER have to pay anything. Learn more.
SRE Platform Engineer @GE Vernova
Devops
Salary unspecified
Remote Location
Employment Type full-time
Posted 3wks ago
Apply for this position
Did not apply
Applied
Sent Follow-Up
Interview Scheduled
Interview Completed
Offer Accepted
Offer Declined
Application Denied
Unlock 165,000+ Remote Jobs
worldwide Be aware of the location restriction for this remote position: Worldwide
Beware of scams! When applying for jobs, you should NEVER have to pay anything. Learn more.
Apply for this position
Did not apply
Applied
Sent Follow-Up
Interview Scheduled
Interview Completed
Offer Accepted
Offer Declined
Application Denied
Unlock 165,000+ Remote Jobs
×

Apply to the best remote jobs
before everyone else

Access 165,000+ vetted remote jobs and get daily alerts.

4.9 ★★★★★ from 500+ reviews
Unlock All Jobs Now

Maybe later