Back to the roster

[Remote] Platform Engineer

Remote Full-time Hiring now

Note: The job is a remote job and is open to candidates in USA. Hyrhub is seeking a Senior Infrastructure Architect / Platform Engineer for their AI/ML platform to provide technical leadership for cloud platforms that support enterprise-scale generative AI applications. The role involves defining infrastructure architecture, leading platform standards, and collaborating with various engineering teams to enhance operational maturity across AI platforms.

Responsibilities

  • Define and drive the technical strategy for AI/ML platform infrastructure supporting generative AI applications, LLM integrations, model routing, and enterprise AI services
  • Architect, build, and operate scalable cloud platforms using AWS services such as EKS, ECS Fargate, Lambda, DynamoDB, S3, OpenSearch, Secrets Manager, CloudWatch, ALB, and MWAA
  • Establish reusable infrastructure patterns using CloudFormation, Helm, and Terraform to support reliable multi-environment and multi-region deployments
  • Lead CI/CD architecture using GitHub Actions, reusable workflows, OIDC-based AWS authentication, automated quality gates, deployment promotion, and environment approvals
  • Design and improve observability across AI platforms, including CloudWatch dashboards, logs, alarms, Prometheus/Grafana, OpenSearch, Langfuse, and LLM-specific operational metrics
  • Build platform capabilities for GenAI workloads, including model availability monitoring
  • Partner with software engineering teams to improve deployment reliability, rollback strategies, health checks, autoscaling, load testing, and runtime performance
  • Define and enforce security and compliance practices for infrastructure, including IAM permission boundaries, Secrets Manager usage, secret scanning, audit logging, tagging standards, and change-management controls
  • Provide technical leadership for cost optimization, capacity planning, environment standardization, and operational resilience across development, test, production, and sandbox environments
  • Mentor engineers, review architecture and infrastructure designs, and influence platform engineering practices across teams
  • Troubleshoot complex production issues across cloud infrastructure, networking, containers, serverless workloads, CI/CD systems, and observability platforms
  • Translate enterprise requirements for security, compliance, reliability, and governance into pragmatic engineering standards and automation

Skills

  • Bachelor's degree in Computer Science, Engineering, Information Technology, or a related technical field, or equivalent practical experience
  • 7+ years of experience in DevOps, platform engineering, cloud infrastructure, site reliability engineering, or software engineering roles
  • Strong hands-on experience with AWS/Azure/GCP infrastructure and services, including container, serverless, networking, storage, observability, and security services
  • Experience designing and operating production systems on Kubernetes, ECS/Fargate, or comparable container orchestration platforms
  • Proficiency with infrastructure-as-code, especially CloudFormation, Terraform, Helm, or similar tooling
  • Strong CI/CD experience with GitHub Actions or similar platforms, including reusable workflows, automated testing, deployment gates, and cloud authentication
  • Experience building and operating observability solutions using CloudWatch, Prometheus/Grafana, OpenSearch, or similar tools
  • Strong understanding of cloud security practices, IAM, secrets management, least-privilege access, audit logging, and compliance requirements
  • Experience supporting distributed systems, microservices, APIs, asynchronous workloads, and multi-environment deployments
  • Demonstrated ability to lead technical design, mentor engineers, and influence engineering practices across teams
  • Experience supporting AI/ML or generative AI platforms, including LLM gateways, model routing, prompt observability, token metering, or model failover
  • Experience operating platforms in regulated enterprise environments, ideally healthcare, pharmaceutical, finance, or life sciences
  • Experience with multi-account, multi-region AWS architectures and enterprise governance patterns
  • Experience with cost optimization, autoscaling strategies, capacity planning, and cloud budget monitoring
  • Experience with load testing and performance validation using tools such as Locust or comparable frameworks
  • Strong Python or scripting skills for platform automation, operational tooling, and CI/CD extensions
  • Ability to communicate complex technical decisions clearly to engineering, security, operations, and leadership audiences

Company Overview

  • Hyrhub was founded in 2014, hiring niche talent is still a problem faced by many companies. It was founded in 2018, and is headquartered in Bangalore, Karnataka, IN, with a workforce of 2-10 employees. Its website is .
  • Apply To This Job

    Related roles