[Remote] Lead AI Platform Engineer
Note: The job is a remote job and is open to candidates in USA. EPAM Systems is seeking a Lead AI Platform Engineer to lead the design and evolution of a next-generation AI Gateway platform. This role involves owning the architecture for secure, scalable access to Large Language Models and agentic workflows, ensuring integration and governance across the enterprise.
Responsibilities
- Develop the AI Gateway platform, including MCP, agents, LLM routing, governance and observability
- Translate business and product requirements into clear problem statements, architectural designs and scalable technical solutions
- Design extensible frameworks for MCP server capabilities and integrations as well as agent orchestration and reusable agent skills
- Architect gateway capabilities such as semantic caching for LLM responses, rate limiting, quotas and traffic governance and guardrails and validation layers for safe LLM/MCP usage
- Design and review authentication and authorization models, including multiple OIDC-based flows, identity propagation and token-based access control for MCP and LLM traffic
- Build prototypes and reference implementations to validate architectural decisions and guide engineering teams
- Partner with multiple product teams integrating with the AI Gateway, providing architecture guidance, best practices and integration patterns
- Review designs and code, providing actionable feedback to ensure quality, performance, scalability and security
- Define and review CI/CD best practices using modern GitHub-based pipelines
- Architect and review Kubernetes-based deployment models, ensuring scalability, resiliency and production readiness
Skills
- 5+ years of software engineering experience
- Proven experience designing and delivering large-scale distributed systems or platform products
- Proficiency in Golang and Python
- Familiarity with AI-centric development with a strong emphasis on quality gates and architecture patterns to enforce agentic behavior in producing quality code
- Deep understanding of AI gateways, AI/LLM platforms or middleware systems, including routing, governance and scalability
- Experience with microservices and service-oriented architectures for multi-tenant SAAS environments, PostgreSQL for transactional data and Redis for caching and distributed coordination
- Strong background in authentication and authorization systems, particularly OAuth-based approaches
- Demonstrated ability to independently drive problem definition, architecture, prototype and execution guidance
- Experience designing platforms and frameworks used by multiple product teams
- English proficiency at B2 level or higher
- Experience with MCP (Model Context Protocol), agentic platforms or AI orchestration frameworks
- Knowledge or know-how in working with EDA software
- Prior work on LLM gateways, API gateways or AI middleware platforms
- Experience building generic developer frameworks or SDKs adopted across teams
- Familiarity with guardrails, semantic caching, prompt/response optimization and LLM cost control techniques
Benefits
- Delivering innovative solutions to industry leaders, making a global impact
- Enjoyable working environment, whether it is the vibrant office or the comfort of your home
- Opportunity to work abroad for up to two months per year
- Relocation opportunities within our offices in 55+ countries
- Corporate and social events
- Leadership development, career advising, soft skills and well-being programs
- Certifications, including GCP, Azure and AWS
- Unlimited access to LinkedIn Learning and Udemy
- Free English classes with certified teachers
- Participation in the Employee Stock Purchase Plan
- Monetary bonuses for engaging in the referral program
- Comprehensive medical & family care package
- Four trust days per year for personal needs
- Discounts for fitness clubs
- Benefits package (hotels, restaurants, stores and services)
Company Overview
Company H1B Sponsorship