About Busha
Since our inception in 2018, Busha has been at the forefront of digital currency exchange, offering trading options in cryptocurrencies like Bitcoin and Ethereum. Our mission is to build an open financial system characterized by innovation, efficiency, and user-friendliness. We value Clear Communication, Positive Energy, Efficient Execution, and Continuous Learning in all our team members.
Overview: We are seeking an experienced SRE to engineer the reliability of our Platform API. You will work closely with DevOps on infrastructure and observability, while partnering with backend engineers to build reliability into our services from the ground up. You will combine deep knowledge of distributed systems with hands-on Go coding skills to define SLOs, lead incident response, build automation, and embed resilience patterns directly into our codebase.
Responsibilities
- Act as Incident Commander during major severity incidents affecting payments, trading, or compliance systems: coordinate cross-functional response, provide clear status updates, and drive post-mortems.
- Design and implement observability strategies using Grafana, Sentry, and CloudWatch. Instrument Go services to expose high-cardinality metrics and distributed traces. Collaboratively define, measure, and defend Service Level Objectives (SLOs) and Error Budgets with product and engineering teams.
- Write production-ready code to build internal tooling, automation platforms, and self-healing mechanisms that eliminate manual operator intervention. Contribute reliability patterns (circuit breakers, retries, backpressure) directly to backend services.
- Partner with backend engineering teams during the design phase to ensure new services are built with reliability, scalability, and observability patterns from day one.
- Analyze system performance and traffic patterns to model future capacity needs. Conduct load testing and chaos engineering experiments to verify system resilience under failure conditions, particularly for financial transactions and compliance workflows
Must Have
- Minimum of 4 years of experience in SRE or Backend Engineering with good proficiency in Go. You can read, write, and review production Go code, not just deploy it.
- Deep understanding of distributed systems architecture and design patterns. Strong command of microservices fundamentals, event-driven architectures, and the underlying principles required to build systems that scale.
- Hands-on experience with AWS (ECS, RDS, CloudWatch, Lambda) or GCP, and infrastructure as code. Proficiency in running production workloads and troubleshooting infrastructure issues.
- Experience designing and implementing observability strategies using Prometheus, Grafana, OpenTelemetry, or similar tools. Ability to instrument code for proper monitoring and alerting
- Familiarity with operating and tuning production data stores (PostgreSQL, ClickHouse) and streaming platforms (RabbitMQ, Kafka) in high-throughput environments.
Nice to Have
- Fintech bonus: Understanding of financial systems reliability requirements, payment processing resilience patterns, or experience with compliance/regulatory
- Go bonus: Proficiency in Go is a significant advantage. Our backend services are written in Go, and the ability to read, write, and contribute reliability patterns directly to production Go code will enable deeper collaboration with engineering teams and faster impact on system resilience.
What We Offer
- Progressive hybrid work policy.
- Competitive salary.
- Learning and development plan.
- Health insurance & pension.
- Work tools and gadget that works for you.