[Remote] Principal AI Platform Engineer
Note: The job is a remote job and is open to candidates in USA. MaintainX is the world's leading mobile-first work execution platform for industrial and frontline teams. They are seeking a Principal AI Platform Engineer to own the technical vision for the AI platform, leading architecture and strategic decisions to enhance product features and drive engineering-wide adoption.
Responsibilities
- Define the Agent / Skill / Tool architecture. Evolve the platform so agents can reason, plan, and collaborate, skills are discoverable and reused across workflows, and tools expose structured, permission-aware access to operational data. Design for progressive autonomy as trust, reliability, and governance mature
- Scale the context graph. Architect the retrieval and knowledge systems that turn 14,000+ digitized equipment manuals, 370,000+ procedures created yearly, and 27M+ annual work orders into customer-specific intelligence with cross-asset reasoning
- Build the developer experience. Own agent orchestration, an MCP tool registry, reproducible dev environments, and the observability layer that makes agents trustworthy at scale. Engineers should define agent behavior through schemas and prompts while the platform handles routing, validation, evaluation, and observability
- Build the evaluation and feedback loop. Design offline and online evaluation systems so every user interaction (accept, refine, override) improves agent performance
- Drive cross-division alignment. Partner with product engineering teams across Plant Setup, Maintenance Planning, Maintenance Execution, Reliability Engineering, Parts & Purchasing, and Reporting. Represent MaintainX engineering internally and externally
Skills
- 10+ years of software engineering experience, with significant depth in backend systems, distributed architecture, or platform / infrastructure engineering
- 3+ years building ML or LLM-powered systems in production, with real operating experience around evaluation, latency, cost, and reliability
- A track record of company-wide technical leadership at the Principal level. You have defined architecture for systems used by multiple teams, influenced engineering strategy beyond a single division, and held leaders accountable for adoption
- Internal platform experience where your customers are other engineers. You treat developer experience as a product: adoption, documentation, enablement, and ergonomics are first-class
- Strong Python backend expertise, with experience designing APIs, services, and data pipelines
- The ability to write a clear design document, facilitate a technical decision across teams, and represent engineering externally with credibility
- Production agentic systems: multi-agent orchestration, tool use, skill registries
- MCP, LangGraph, LlamaIndex, plus evaluation and observability platforms like Langfuse, LangSmith, or Braintrust
- RAG pipelines, vector databases, embedding strategies, and knowledge graphs at scale
- B2B SaaS, industrial software, IoT / OT, or other domains where AI must operate with high reliability and domain-specific context
Benefits
- Competitive base, equity, and variable comp aligned to role and location.
- Equity in a high-growth, post-Series D company.
- Day-1 health, dental, and vision coverage.
- Unlimited PTO, which we actually take.
- Flexible token limits.
- Work from our San Francisco, Toronto, or Montreal hubs, or remote across the United States and Canada.
Company Overview
Company H1B Sponsorship