Back to Remote jobs  >   AI / ML
Senior MLOps Platform Engineer @ARKA Group, LP
AI / ML
Salary unspecified
Remote Location
πŸ‡ΊπŸ‡Έ USA Only
Job Type full-time
Posted 2wks ago

[Hiring] Senior MLOps Platform Engineer @ARKA Group, LP

2wks ago - ARKA Group, LP is hiring a remote Senior MLOps Platform Engineer. πŸ’Έ Salary: unspecified πŸ“Location: USA

Role Description

Our AI Center of Excellence builds the next generation of Agentic AI products that autonomously reason, plan, and act on behalf of our customers. To deliver these capabilities at scale, we need a platform engineering group that provides a robust, secure, and highly available MLOps foundation across both on-premise clusters and AWS. The team works closely with data scientists, product engineers, and SREs to turn experimental models into reliable services that power mission critical applications.

In support of work/life balance, many positions are available for a flexible schedule within the pay period. Ask us about the opportunity for flex scheduling if that’s of interest to you.

Responsibilities

  • Design, implement, and operate a unified MLOps platform that supports both on-premise Kubernetes clusters and AWS.
  • Develop reusable CI/CD pipelines (GitLab CI) for model packaging, containerization, automated testing, canary releases, and rollbacks.
  • Build observability, monitoring, and alerting stacks (Prometheus, Grafana, OpenTelemetry, CloudWatch) to track inference latency, throughput, resource utilization, and data drift for real-time and batch workloads.
  • Create self-service tooling (CLI, SDKs, UI dashboards) that allows data science and product teams to register models, define inference endpoints, and manage versioning without deep DevOps involvement.
  • Architect and maintain data pipelines that feed training data, model artifacts, and inference logs into a governed data lake (S3, on-prem object store).
  • Collaborate with research and product engineers to translate experimental Agentic AI prototypes into production-grade services, ensuring reproducibility, security, and compliance.
  • Drive performance optimization for inference workloads (GPU/CPU scaling, model quantization, batching strategies).
  • Champion best practices in security (IAM, network policies, secret management), cost efficiency, and disaster recovery for the hybrid infrastructure.
  • Mentor junior engineers and contribute to internal knowledge bases, upskilling, and review processes.

Qualifications

  • BS in computer science or related engineering field.
  • 5+ years of experience building and operating production-grade software infrastructure, preferably in a hybrid on-prem/cloud environment.
  • Deep expertise with Kubernetes (cluster provisioning, Helm, operators, custom resources) and container runtimes (Docker, OCI).
  • Hands-on experience with AWS services (EKS, SageMaker, S3, IAM, CloudWatch, Step Functions) and the ability to bridge on-prem resources with AWS via VPN/Direct Connect.
  • Strong software engineering skills in Python and at least one compiled language (Go, Rust, or Java) for building platform components and SDKs.
  • Proficiency with CI/CD and GitOps tooling (Argo CD, Flux, GitLab, GitHub Actions, or similar).
  • Solid understanding of distributed systems (consensus, fault tolerance, load balancing) and experience tuning high throughput, low latency inference pipelines.
  • Experience with data engineering frameworks (Airflow, Prefect, Kafka, Spark, Flink) and building robust, versioned data pipelines.
  • Familiarity with observability stacks (Prometheus, Grafana, OpenTelemetry, ELK) and the ability to define meaningful SLIs/SLOs for AI services.
  • Track record of collaborating with research or product teams to move prototypes to production, translating experimental code into maintainable services.
  • Strong problem-solving mindset, excellent written and verbal communication, and a passion for building scalable AI platforms.

Preferred Qualifications

  • Working knowledge of Scrum and Agile software development methodology.

Location

This is a remote position that will primarily be supporting our Aurora, CO and King of Prussia, PA locations. Due to contract requirements, the job has to be performed from a remote location in the United States.

Benefits

  • Comprehensive medical/vision/dental insurance packages.
  • Company contributions to qualified HSA accounts.
  • 401k retirement plan with industry-leading company contributions.
  • 3 weeks of vacation accrual per year plus time off for sick leave and unscheduled life events.
  • 13 paid holidays.
  • Upfront tuition assistance for approved degree programs.
  • Annual bonus program based on company and employee performance.
  • Company paid life insurance, AD&D, Short-Term and Long-Term disability insurance.
  • 4 weeks paid Parental Leave.
  • Employee assistance program (EAP).
Before You Apply
️
πŸ‡ΊπŸ‡Έ Be aware of the location restriction for this remote position: USA Only
β€Ό Beware of scams! When applying for jobs, you should NEVER have to pay anything. Learn more.
Back to Remote jobs  >   AI / ML
Senior MLOps Platform Engineer @ARKA Group, LP
AI / ML
Salary unspecified
Remote Location
πŸ‡ΊπŸ‡Έ USA Only
Job Type full-time
Posted 2wks ago
Apply for this position
Did not apply βœ“
Applied βœ“
Sent Follow-Up βœ“
Interview Scheduled βœ“
Interview Completed βœ“
Offer Accepted βœ“
Offer Declined βœ“
Unlock 152,720 Remote Jobs
️
πŸ‡ΊπŸ‡Έ Be aware of the location restriction for this remote position: USA Only
β€Ό Beware of scams! When applying for jobs, you should NEVER have to pay anything. Learn more.
Apply for this position
Did not apply βœ“
Applied βœ“
Sent Follow-Up βœ“
Interview Scheduled βœ“
Interview Completed βœ“
Offer Accepted βœ“
Offer Declined βœ“
Unlock 152,720 Remote Jobs
Γ—

Apply to the best remote jobs
before everyone else

Access 152,720+ vetted remote jobs and get daily alerts.

4.9 β˜…β˜…β˜…β˜…β˜… from 500+ reviews
Unlock All Jobs Now

Maybe later