[Remote] Site Reliability Engineer

Remote Full-time Hiring now

Note: The job is a remote job and is open to candidates in USA. TalentDome Staffing is a high-growth, AI-driven narrative intelligence startup seeking a Senior Site Reliability Engineer (SRE) / Infrastructure Engineer. The role requires operational ownership of a production environment, focusing on infrastructure orchestration, high-throughput scaling, and GPU application deployment to support massive data flows.

Responsibilities

Infrastructure Orchestration: Maintain, optimize, and expand the core infrastructure, ensuring everything is cleanly declared via Terraform and managed across high-performance Kubernetes clusters
High-Throughput Scaling: Design and manage environments capable of sustaining immense data ingestion scaling, high-throughput pipelines, and massive search database operations
GPU Application Deployment: Collaborate with the R&D team to successfully deploy, optimize, and manage highly specialized machine learning and AI applications running on GPUs
System Optimization & Reliability: Partner closely with backend teams to heavily optimize production Java deployments and Python workflows, guaranteeing maximum uptime, high availability, and seamless scaling
Technical Leadership: Serve as a foundational pillar for infrastructure architecture, establishing operational best practices without requiring handholding or micro-management

Skills

8+ years of dedicated, hands-on experience with Kubernetes and Terraform
Ideally 15+ years of total technical experience in infrastructure or site reliability engineering
Deep architectural mastery of deployment systems, cluster orchestration, and high-availability scaling
Proven cloud hosting experience, with strong proficiency in AWS
Exposure to or experience with GCP is a significant advantage for supporting R&D workflows
Concrete experience deploying and scaling application workflows that interface with GPUs and high-volume data ingestion layers
Familiarity with or exposure to optimizing runtime environments for Java and Python applications is highly beneficial
Exceptional self-direction and problem-solving capability
Professional maturity to eventually step into a formal leadership role as the infrastructure team expands

Benefits

True Operational Autonomy: The opportunity to architect and scale greenfield deployments for a rapidly expanding AI data platform.
High-Caliber Environment: Collaborate directly with an elite team of backend engineers and machine learning R&D specialists.
Flexible, Modern Workspace: Enjoy 100% remote working flexibility across the United States.
Open to equity incentives

Company Overview

TalentDome is your R&D talent partner in SmartTech across the software development life cycle (SDLC) and the software stack. We connect U.S. It was founded in 2024, and is headquartered in Dallas, Texas, US, with a workforce of 2-10 employees. Its website is https://www.talentdomestaffing.com.

Apply To This Job

Apply

[Remote] Site Reliability Engineer

Related roles

[Remote] Cloud Platform Engineer

[Remote] Paid Media Lead, Mapping

[Remote] Network Engineer II

[Remote] Lead Data Scientist

[Remote] Business Analyst (Claims), Senior

[Remote] Senior Sales Engineer

[Remote] Senior Accountant II

[Remote] Fractional CRO, Financial and Digital Markets

[Remote] Database Track Sr.Engineer

[Remote] Business Analyst - Oracle Health

Employelevate Part Time Evening Work From Home Data Entry At The

Experienced Remote Customer Service/Data Entry Representative – Claims Coordination Team Support

Customer Service Representative

FP&A Principal

People Analytics Analyst | Remote

American Express Work From Home Jobs Chicago

Guided Solutions Phone Banker Specialist (Remote Opportunity Salt Lake City, Utah)

Senior QA Engineer - Remote

Experienced Customer Service Representative – Global Entertainment Leader

Senior/Staff/Principal SWE – OT Security Engineering

[Remote] Site Reliability Engineer

Related roles

[Remote] Cloud Platform Engineer

[Remote] Paid Media Lead, Mapping

[Remote] Network Engineer II

[Remote] Lead Data Scientist

[Remote] Business Analyst (Claims), Senior

[Remote] Senior Sales Engineer

[Remote] Senior Accountant II

[Remote] Fractional CRO, Financial and Digital Markets

[Remote] Database Track Sr.Engineer

[Remote] Business Analyst - Oracle Health

Employelevate Part Time Evening Work From Home Data Entry At The

Experienced Remote Customer Service/Data Entry Representative – Claims Coordination Team Support

Customer Service Representative

FP&A Principal

People Analytics Analyst | Remote

American Express Work From Home Jobs Chicago

Guided Solutions Phone Banker Specialist (Remote Opportunity  Salt Lake City, Utah)

Senior QA Engineer - Remote

Experienced Customer Service Representative – Global Entertainment Leader

Senior/Staff/Principal SWE – OT Security Engineering

Guided Solutions Phone Banker Specialist (Remote Opportunity Salt Lake City, Utah)