Back to Remote jobs  >   AI / ML
Certified AI Infrastructure and Kubernetes Platform Architect @IT Search Corp
AI / ML
Salary usd 110 - 150 p..
Remote Location
πŸ‡ΊπŸ‡Έ USA Only
Job Type full-time
Posted 2d ago

[Hiring] Certified AI Infrastructure and Kubernetes Platform Architect @IT Search Corp

2d ago - IT Search Corp is hiring a remote Certified AI Infrastructure and Kubernetes Platform Architect. πŸ’Έ Salary: usd 110 - 150 per hour πŸ“Location: USA

Role Description

We are seeking a highly skilled AI Infrastructure and Kubernetes Platform Architect with deep expertise in managing GPU-accelerated workloads on NVIDIA DGX systems. The ideal candidate will have hands-on experience with Kubernetes at the administrator, application developer, and security levels (CKA, CKAD, CKS), and will be responsible for designing, deploying, securing, and maintaining large-scale AI infrastructure powered by DGX BasePODs and SuperPODs. This role involves optimizing AI workloads, managing high-performance networking (InfiniBand), and ensuring operational excellence across NVIDIA AI systems and BlueField DPU environments.

Key Responsibilities

  • Kubernetes and AI Platform Orchestration
    • Architect and maintain containerized AI/ML platforms using Kubernetes on DGX systems.
    • Integrate NVIDIA Base Command Manager with Kubernetes for workload scheduling and GPU resource optimization.
    • Design multi-tenant GPU resource partitioning strategies using MIG (Multi-Instance GPU) to maximize hardware utilization across concurrent AI workloads.
    • Implement and manage Helm charts, custom controllers, and GPU operators for scalable ML infrastructure.
  • DGX Infrastructure Administration
    • Administer and optimize NVIDIA DGX BasePODs and SuperPODs.
    • Ensure optimal GPU, CPU, and storage performance across AI clusters.
    • Leverage DGX System Administration best practices for lifecycle management and updates.
    • Coordinate capacity planning for DGX cluster expansion including rack power, cooling, and storage integration with NVIDIA AI Enterprise software stack.
  • High-Performance Networking & DPU
    • Deploy, monitor, and manage InfiniBand networks using Unified Fabric Manager (UFM).
    • Integrate BlueField DPUs for offloaded security, networking, and storage tasks.
    • Optimize end-to-end data pipelines from storage to GPUs.
  • Security and Compliance
    • Apply best practices from the CKS certification to harden Kubernetes clusters and AI workloads.
    • Implement secure service mesh and microsegmentation with BlueField DPU integration.
    • Conduct regular audits, vulnerability scanning, and security policy enforcement.
  • Automation & Monitoring
    • Automate deployment pipelines and infrastructure provisioning with IaC tools (Terraform, Ansible).
    • Monitor performance metrics using GPU telemetry, Prometheus/Grafana, and NVIDIA DCGM.
    • Troubleshoot and resolve complex system issues across hardware and software layers.
    • Implement MLOps workflows integrating KubeFlow Pipelines, NVIDIA Triton Inference Server, and model registry tooling to support end-to-end model training and production deployment.

Qualifications

  • CKA, CKAD, CKS certifications – demonstrating full-stack Kubernetes expertise.
  • Proven experience with NVIDIA DGX systems and AI workload orchestration.
  • Hands-on expertise in InfiniBand networking, UFM, and BlueField DPU administration.
  • Strong scripting and automation skills in Python, Bash, YAML.
  • Familiarity with Base Command Manager, NVIDIA GPU Operator, and KubeFlow is a plus.
  • Ability to work across teams to support ML researchers, DevOps engineers, and infrastructure teams.
Before You Apply
️
πŸ‡ΊπŸ‡Έ Be aware of the location restriction for this remote position: USA Only
β€Ό Beware of scams! When applying for jobs, you should NEVER have to pay anything. Learn more.
Back to Remote jobs  >   AI / ML
Certified AI Infrastructure and Kubernetes Platform Architect @IT Search Corp
AI / ML
Salary usd 110 - 150 p..
Remote Location
πŸ‡ΊπŸ‡Έ USA Only
Job Type full-time
Posted 2d ago
Apply for this position
Did not apply βœ“
Applied βœ“
Sent Follow-Up βœ“
Interview Scheduled βœ“
Interview Completed βœ“
Offer Accepted βœ“
Offer Declined βœ“
Unlock 152,720 Remote Jobs
️
πŸ‡ΊπŸ‡Έ Be aware of the location restriction for this remote position: USA Only
β€Ό Beware of scams! When applying for jobs, you should NEVER have to pay anything. Learn more.
Apply for this position
Did not apply βœ“
Applied βœ“
Sent Follow-Up βœ“
Interview Scheduled βœ“
Interview Completed βœ“
Offer Accepted βœ“
Offer Declined βœ“
Unlock 152,720 Remote Jobs
Γ—

Apply to the best remote jobs
before everyone else

Access 152,720+ vetted remote jobs and get daily alerts.

4.9 β˜…β˜…β˜…β˜…β˜… from 500+ reviews
Unlock All Jobs Now

Maybe later