Senior Site Reliability Engineer

by Lever

Apply Now

18 May 2025

#Description#

Job Title: Senior Site Reliability Engi
Location: Remote
Job Type: Full-time
Department: Engineering (Cloud Infrastructure)
Reports To: kalyan.chakravarthy@employinc.com

Position Overview

We are looking for a Senior Site Reliability Engineer (SRE) to join our team and take a lead role in building highly reliable, scalable, and efficient systems. You will be instrumental in driving modern engineering practices that blend software engineering and infrastructure expertise. As an SRE, you’ll own the health of our production environment, define service reliability standards, and implement tooling and automation that empowers engineering teams to move fast without compromising system stability.

You’ll work cross-functionally with developers, security teams to build observability, manage incidents, and proactively reduce toil through automation—creating systems that are not just available, but resilient and maintainable.

Key Responsibilities

Develop and manage Infrastructure as Code (IaC) solutions using tools such as Terraform, Ansible, or similar
Eliminate toil through automation of deployments, monitoring, and operational tasks using tools like Terraform, Ansible, Python, or Go
Proficient in programming with practical experience adhering to coding standards and design principles with any programming language python, java, Go etc..
Hands-on experience with any of the CI/CD tools such as GitHub Actions CI/CD, Argo CD etc..
Excellent troubleshooting, problem-solving, and analytical skills
Good knowledge on the database administrator
Collaborate with Development teams to implement and promote Site Reliability Engineering best practices
Automate routine operational tasks, deployment processes, and system monitoring.
Define and maintain Service Level Objectives (SLOs) and Error Budgets for critical applications and services
Monitor system performance, proactively identify issues, and lead effective incident response and root cause analysis
Identify and implement opportunities to enhance system scalability, efficiency, and reliability
Collaborate with Security teams to ensure systems align with ISO 27001, SOC 2, and other compliance standards
Stay current with industry trends and emerging technologies to continually improve our SRE capabilities

Minimum Qualifications

5+ years of experience in Site Reliability Engineering or a similar role
Proficiency in one or more programming/scripting languages such as Java, Python, Go, PHP, or Ruby
Strong experience with Unix/Linux systems administration and internals
Solid understanding of system design, distributed computing, and SRE principles
Expertise with containerization and orchestration technologies such as Docker and Kubernetes
Experience with one or more cloud platforms: AWS, Azure, or Google Cloud Platform (GCP)
Experience with relational databases like PostgreSQL, MySQL, or SQL Server

Preferred Qualifications (Good to Have)

Proficiency in scripting and automation using tools such as NodeJS
Familiarity with NoSQL solutions such as MongoDB, Redis, DynamoDB
Ability to monitor, optimize, and troubleshoot database performance in high-availability environments
Experience with backup, replication, and data recovery strategies
Excellent communication and collaboration abilities

Why Join Us

Continuous learning culture with opportunities to explore emerging technologies
Will be engaged in developing and maintaining applications across multiple ecosystems, each with distinct architectural patterns and design considerations
Collaborate with talented engineers who value innovation and ownership
Flexible work environment with a focus on outcomes and autonomy

Apply Now

Employment Type

On-site

Lever

View profile

Senior Site Reliability Engineer

Related Jobs