Job Title: Senior Site Reliability Engi
Location: Remote
Job Type: Full-time
Department: Engineering (Cloud Infrastructure)
Reports To: kalyan.chakravarthy@employinc.com
Position Overview
We are looking for a Senior Site Reliability Engineer (SRE) to join our team and take a lead role in building highly reliable, scalable, and efficient systems. You will be instrumental in driving modern engineering practices that blend software engineering and infrastructure expertise. As an SRE, you’ll own the health of our production environment, define service reliability standards, and implement tooling and automation that empowers engineering teams to move fast without compromising system stability.
You’ll work cross-functionally with developers, security teams to build observability, manage incidents, and proactively reduce toil through automation—creating systems that are not just available, but resilient and maintainable.
Key Responsibilities
- Develop and manage Infrastructure as Code (IaC) solutions using tools such as Terraform, Ansible, or similar
- Eliminate toil through automation of deployments, monitoring, and operational tasks using tools like Terraform, Ansible, Python, or Go
- Proficient in programming with practical experience adhering to coding standards and design principles with any programming language python, java, Go etc..
- Hands-on experience with any of the CI/CD tools such as GitHub Actions CI/CD, Argo CD etc..
- Excellent troubleshooting, problem-solving, and analytical skills
- Good knowledge on the database administrator
- Collaborate with Development teams to implement and promote Site Reliability Engineering best practices
- Automate routine operational tasks, deployment processes, and system monitoring.
- Define and maintain Service Level Objectives (SLOs) and Error Budgets for critical applications and services
- Monitor system performance, proactively identify issues, and lead effective incident response and root cause analysis
- Identify and implement opportunities to enhance system scalability, efficiency, and reliability
- Collaborate with Security teams to ensure systems align with ISO 27001, SOC 2, and other compliance standards
- Stay current with industry trends and emerging technologies to continually improve our SRE capabilities
Minimum Qualifications
- 5+ years of experience in Site Reliability Engineering or a similar role
- Proficiency in one or more programming/scripting languages such as Java, Python, Go, PHP, or Ruby
- Strong experience with Unix/Linux systems administration and internals
- Solid understanding of system design, distributed computing, and SRE principles
- Expertise with containerization and orchestration technologies such as Docker and Kubernetes
- Experience with one or more cloud platforms: AWS, Azure, or Google Cloud Platform (GCP)
- Experience with relational databases like PostgreSQL, MySQL, or SQL Server
Preferred Qualifications (Good to Have)
- Proficiency in scripting and automation using tools such as NodeJS
- Familiarity with NoSQL solutions such as MongoDB, Redis, DynamoDB
- Ability to monitor, optimize, and troubleshoot database performance in high-availability environments
- Experience with backup, replication, and data recovery strategies
- Excellent communication and collaboration abilities
Why Join Us
- Continuous learning culture with opportunities to explore emerging technologies
- Will be engaged in developing and maintaining applications across multiple ecosystems, each with distinct architectural patterns and design considerations
- Collaborate with talented engineers who value innovation and ownership
- Flexible work environment with a focus on outcomes and autonomy