[Remote] Sr. Site Reliability Engineer
Note: The job is a remote job and is open to candidates in USA. AuthZed is a Series A company focused on fixing broken access control with innovative products. As a Site Reliability Engineer, you will ensure the reliability and performance of systems while designing and maintaining scalable infrastructure solutions to support a growing customer base.
Responsibilities
- Design, implement, and maintain highly available and scalable infrastructure solutions for our projects, products, and customers
- Monitor and analyze system performance, identifying and resolving bottlenecks and issues to ensure optimal performance and reliability
- Automate infrastructure deployment and configuration management processes
- Continuously improve system reliability, security, and efficiency through proactive monitoring, capacity planning, and performance tuning
- Troubleshoot and resolve complex infrastructure and application issues in production and test environments
- Collaborate with software engineering teams to design and implement systems that are resilient, scalable, and secure
- Participate in on-call rotation and respond to production incidents in a timely manner
- Document system configurations, troubleshooting procedures, and operational guidelines
Skills
- Proven experience as a Site Reliability Engineer or in a similar role
- Strong understanding of networking, operating systems, and cloud infrastructure
- Experience with Site Reliability Engineering, System Design, and Distributed Computing
- Experience in various programming languages — we currently have SDKs for NodeJS, Java, Python, Ruby, and Go
- Experience with containerization technologies such as Docker and Kubernetes
- Knowledge of infrastructure-as-code tools like Terraform and Pulumi
- Familiarity with monitoring and logging tools (e.g., Prometheus, Grafana, ELK stack)
- Experience with lower-level implementation details of relational databases (bonus if you have have experience with distributed SQL databased like Google Cloud Spanner or CockroachDB)
- Experience working with Git and GitHub
- Experience with continuous integration and deployment systems
- Strong problem-solving and troubleshooting skills
- Excellent communication and collaboration abilities
- Experience with Authorization systems
Benefits
- Stock options at an early-stage startup.
- Comprehensive benefits including healthcare (US-based) and other insurance.
- A full remote and flexible schedule to accommodate different timezones
- Twice-yearly travel for team offsites focused on team bonding, collaboration, and having fun!
Company Overview