[Hiring] Technical Architect - Machine Learning @Quantiphi
Technical Architect - Machine Learning @Quantiphi
Artificial Intelligence
Salary unspecified
Remote Location
Employment Type full-time
Posted 6d ago

[Hiring] Technical Architect - Machine Learning @Quantiphi

6d ago - Quantiphi is hiring a remote Technical Architect - Machine Learning. πŸ’Έ Salary: unspecified πŸ“Location: USA, Canada

Role Description

We are seeking an experienced Senior Machine Learning Engineer to architect, build, and deploy production-grade agentic AI systems and multi-agent workflows from the ground up. The ideal candidate will have deep expertise in designing autonomous AI systems that can collaborate, reason, and execute complex tasks with minimal human intervention. You will be responsible for creating scalable, robust agentic workflows using cutting-edge frameworks like CrewAI/Langraph, while ensuring enterprise-grade deployment on major cloud platforms.

Roles & Responsibilities

  • Agentic System Architecture & Development:
    • Architect & Build Agentic Systems:
      • Design and develop end-to-end multi-agent systems from scratch.
      • Create foundational agent harnesses, define communication protocols, and build orchestration layers using frameworks like CrewAI, Langgraph, and AutoGen.
      • Ensure hierarchical and collaborative multi-agent structures with well-defined agent roles, responsibilities, and communication protocols.
      • Implement dynamic task decomposition, sophisticated tool integration, planning mechanisms (ReAct), and self-correction loops.
      • Develop state management systems and memory mechanisms for persistent agent interactions.
    • Engineer Advanced Agent Capabilities:
      • Develop custom agent-tools and define specialized agent-skills that empower agents to perform complex, domain-specific tasks.
    • Pioneer Context Engineering:
      • Implement advanced context engineering and memory systems to ensure agents maintain state, learn from interactions, and make informed decisions in dynamic environments.
    • Deploy Production-Grade Solutions:
      • Own the deployment, scaling, and maintenance of robust, low-latency agentic systems on major cloud platforms (GCP, AWS, or Azure).
      • Implement best-in-class MLOps practices for monitoring, continuous integration/continuous deployment (CI/CD), and system reliability.
    • Integrate and Optimize LLMs:
      • Integrate LLMs to serve as the core reasoning engines for autonomous agents.
      • Apply advanced techniques like RAG and PEFT to optimize performance.
    • Tool Development & RAG Integration:
      • Create and maintain comprehensive tool libraries for agents including API integrations, database queries, and external service connections.
      • Design and implement RAG systems using vector databases (Pinecone, Weaviate, ChromaDB).
      • Develop custom tools and plugins that enable agents to interact with various enterprise systems and APIs.
      • Ensure tool reliability, error handling, and seamless integration within agentic workflows.
    • Observability, Monitoring & Evaluation:
      • Implement comprehensive monitoring and tracing systems for agent behavior, performance, cost optimization, and latency analysis.
      • Design novel evaluation frameworks to assess multi-step agentic task success, reliability, and accuracy.
      • Utilize advanced observability tools (LangSmith, Arize AI, or custom solutions) to trace agent decision-making processes.
      • Establish metrics and KPIs for measuring agentic system performance in production environments.

Qualifications

  • 6-8 years of hands-on experience in machine learning and AI engineering with proven track record of taking ML systems to production.
  • Demonstrated expertise in building multi-agent systems and agentic workflows, preferably with Langraph/CrewAI.

Requirements

  • Technical Skills - Must Have:
    • Expert-level Python proficiency with ML frameworks (TensorFlow, PyTorch, Transformers).
    • Experience with FastAPI, async programming, and microservices architecture.
    • Hands-on experience with vector databases (Pinecone, Weaviate, ChromaDB) and building scalable RAG systems.
    • Experience with LLM application monitoring tools (LangSmith, Weights & Biases, custom telemetry solutions).
    • Proven ability to architect and implement complex AI systems from scratch in production environments.
    • Production-level experience with at least one major cloud platform (AWS, GCP, or Azure).
    • Strong skills in Infrastructure as Code (Terraform, CloudFormation), CI/CD pipelines (GitHub Actions, Jenkins), and containerization (Docker, Kubernetes).
  • Technical Skills - Good to have:
    • Experience with prompt engineering techniques, fine-tuning SLMs (PEFT, SFT, RLHF), and model optimization.
    • Knowledge of distributed systems, message queues, and event-driven architectures for agent coordination.
    • Familiarity with SDLC best practices, version control (Git), and agile development methodologies.
    • Experience with tool-calling agents, multi-step workflows, and stateful orchestration (e.g. graphs, planners, routers).
    • Hands-on evals for agents: trajectory/tool-use checks, golden traces, LLM-as-judge with fixed rubrics, regression suites.
    • Online evals, drift thinking, and clear quality gates before or after deploy (thresholds, alerts, rollback criteria).
    • Safety and abuse: prompt injection via tools, untrusted retrieval, PII handling in prompts and logs, allowlists and guardrails.
    • Cost and latency discipline: budgets per run, timeouts, caps on turns and tool calls.
    • Model lifecycle: routing/gateway patterns, version pinning, fallbacks, and which model for which step.
    • Memory and state: what is persisted, retention, redaction, and what must never be stored.
  • Soft Skills:
    • Exceptional problem-solving and analytical thinking with ability to tackle complex, ambiguous challenges.
    • Strong communication skills to explain complex agentic concepts to both technical and non-technical stakeholders.
    • Proven ability to work independently and drive large-scale projects to completion with minimal supervision.
    • Leadership mindset with experience mentoring team members and driving technical excellence.

Benefits

  • Be part of a trailblazing team that’s shaping the future of AI, ML, and cloud innovation.
  • If you like wild growth and working with happy, enthusiastic over-achievers, you'll enjoy your career with us!
Before You Apply
️
remote Be aware of the location restriction for this remote position: USA, Canada
β€Ό Beware of scams! When applying for jobs, you should NEVER have to pay anything. Learn more.
Technical Architect - Machine Learning @Quantiphi
Artificial Intelligence
Salary unspecified
Remote Location
Employment Type full-time
Posted 6d ago
Apply for this position
Did not apply βœ“
Applied βœ“
Sent Follow-Up βœ“
Interview Scheduled βœ“
Interview Completed βœ“
Offer Accepted βœ“
Offer Declined βœ“
Unlock 150,000+ Remote Jobs
️
remote Be aware of the location restriction for this remote position: USA, Canada
β€Ό Beware of scams! When applying for jobs, you should NEVER have to pay anything. Learn more.
Apply for this position
Did not apply βœ“
Applied βœ“
Sent Follow-Up βœ“
Interview Scheduled βœ“
Interview Completed βœ“
Offer Accepted βœ“
Offer Declined βœ“
Unlock 150,000+ Remote Jobs
Γ—

Apply to the best remote jobs
before everyone else

Access 150,000+ vetted remote jobs and get daily alerts.

4.9 β˜…β˜…β˜…β˜…β˜… from 500+ reviews
Unlock All Jobs Now

Maybe later