[Remote] Principal AI/ML Architect
Note: The job is a remote job and is open to candidates in USA. Caylent is a cloud native services company that helps organizations leverage technology using Amazon Web Services (AWS). They are seeking a Principal AI/ML Architect to lead client engagements, shape strategy, and provide architectural guidance for machine learning projects, ensuring technical quality and driving business value for customers.
Responsibilities
- Lead end-to-end ML assessments across infrastructure, data pipelines, model lifecycle, and organizational readiness — producing recommendations that drive executive decision-making and earn Caylent the next engagement
- Partner with sales and solutions teams through the proposal and scoping phase, contributing the technical depth needed to shape well-grounded statements of work
- Serve as the senior technical authority on client engagements — possibly across multiple projects simultaneously — providing architectural guidance, ensuring technical quality from your project team members, and getting hands-on when the engagement demands it, without owning day-to-day implementation responsibilities
- Own or orchestrate high-quality POCs that give customers confidence before committing to a larger initiative
- Advise customers on ML operations standards and architecture — covering MLOps pipeline design, model lifecycle management, LLMOps patterns, and production monitoring frameworks — translating operational complexity into decisions and guardrails their teams can own and sustain
- Shape how Caylent wins its most technically complex opportunities — contributing the architectural thinking and credibility that turns prospects into customers
- Strengthen the ML practice from the inside — through peer guidance, technical interviews, and contributions to accelerators, reference architectures, and thought leadership content
Skills
- 10+ years in machine learning or AI, with a proven track record of leading client-facing engagements in a consulting or advisory capacity
- Deep, current knowledge of the AWS ML and GenAI ecosystem, with the ability to make and defend architectural decisions across the full ML lifecycle — from data and feature engineering through training, deployment, and monitoring
- Deep expertise in at least two or three ML domains — whether traditional ML, computer vision, NLP, time series, or others — combined with the judgment to assess, architect, and advise across the broader ML landscape
- Proven ability to architect and govern production ML systems end-to-end, translating MLOps, LLMOps, and broader AI operations complexity into standards and decisions that engineering teams can execute and executives can act on
- Deep expertise across foundation model adaptation — fine-tuning (LoRA, QLoRA, PEFT), alignment (RLHF, DPO), inference optimization (quantization, vLLM), and distributed training (DeepSpeed, FSDP) — combined with RAG and agentic system design, including multi-agent architectures, event-driven workflows, MCP integration, and human-in-the-loop patterns on AWS. Technical authority to prescribe the right approach and set architectural standards that teams can execute against
- Proven ability to operate independently in complex customer environments — navigating ambiguity, aligning stakeholders, and translating ML tradeoffs into business risk and value for both technical and executive audiences
- AWS Certified Machine Learning – Specialty and/or AWS Certified Solutions Architect – Professional
- Experience shaping practice-level standards, reference architectures, and reusable ML accelerators across multiple engagements
- Exposure to varied industries and problem types in a consulting or client-facing context
- Deep fluency in responsible AI practices — model evaluation, bias detection, fairness frameworks, and AI governance — applied in enterprise deployments
- Hands-on experience designing and deploying SRE agents and AI-driven operations workflows in production — spanning automated incident detection, triage, and remediation — with the ability to integrate across observability platforms and translate AI operations outcomes into measurable business value
Benefits
- 100% remote work
- Private Health Insurance
- Flexible Time Off
- Competitive phantom equity
- Paid for exams and certifications
- Peer bonus awards
- State of the art laptop and tools
- Equipment & Office Stipend
- Individual professional development plan
- Annual stipend for Learning and Development
- Work with an amazing worldwide team and in an incredible corporate culture
Company Overview
Company H1B Sponsorship