Loading...

Senior Infrastructure & Operations Engineer (Kubernetes / Platform Reliability)

12 April 2026

About the Role

We’re looking for a senior infrastructure and operations engineer to own and evolve our platform reliability. You’ll design, operate, and maintain our Kubernetes-based infrastructure, build reliable monitoring and alerting pipelines, and ensure our systems remain stable under real-world load and failure conditions.
This is a hands-on role for someone with deep experience running production systems at scale and who focuses on making infrastructure predictable and stable. You’ll work across Kubernetes, networking, CI/CD, Cloudflare, and observability to create a platform engineers can trust.

What You’ll Do

  • Design, deploy, and maintain production Kubernetes clusters.
  • Own cluster reliability, upgrades, security, and performance.
  • Build and operate monitoring, logging, and alerting pipelines.
  • Ensure full-stack observability across infrastructure and services.
  • Design and maintain CI/CD pipelines that are fast, reproducible, and safe.
  • Improve deployment strategies (rollouts, canaries, rollbacks).
  • Automate infrastructure provisioning and configuration.
  • Investigate and resolve production incidents.
  • Improve system resilience, redundancy, and recovery strategies.
  • Define SLOs/SLIs and track reliability targets.
  • Optimize and maintain our Cloudflare setup (caching, routing, security, edge behavior).
  • Work closely with engineering teams to improve operational practices.
  • Identify and remove single points of failure.

What We’re Looking For

Must-have

  • Senior-level experience operating production infrastructure.
  • Deep, hands-on expertise with Kubernetes (cluster internals, networking, storage, security).
  • Strong networking fundamentals (TCP/IP, routing, DNS, TLS, load balancing).
  • Experience debugging distributed systems and network-related issues.
  • Experience optimizing CDN and edge setups, including Cloudflare.
  • Strong experience building monitoring and observability systems.
  • Experience with metrics, logs, traces, and alerting pipelines.
  • Experience designing reliable CI/CD pipelines.
  • Strong Linux fundamentals.
  • Experience with infrastructure as code and automation.
  • Ability to debug issues across the entire stack.
  • Experience handling incidents and conducting postmortems.

Nice-to-have

  • Experience with multi-cluster or multi-region setups.
  • Experience with high-throughput or data-heavy systems.
  • Experience with Elasticsearch or large-scale data infrastructure.
  • Experience with service meshes.
  • Experience with cost optimization and capacity planning.
  • Experience in regulated or reliability-focused environments.

How You Work

  • You assume infrastructure will fail and design accordingly.
  • You prioritize reliability, visibility, and recoverability.
  • You build systems that engineers trust in production.
  • You automate carefully and deliberately.
  • You are calm and methodical during incidents.
  • You focus on long-term stability over short-term fixes.
  • You document and standardize important processes.

Example Problems You Might Work On

  • Hardening Kubernetes clusters for high availability and safe upgrades.
  • Debugging network latency or connectivity issues across services.
  • Optimizing Cloudflare caching, routing, and edge security rules.
  • Building monitoring pipelines that provide reliable signals.
  • Designing alerting that is actionable and low-noise.
  • Improving deployment reliability and rollback safety.
  • Removing single points of failure in production systems.
  • Ensuring observability across all services and data pipelines.

Why Join Range

  • Join one of the fastest-growing sectors in Web3 as stablecoins reach mass adoption.
  • Competitive compensation with meaningful equity upside.
  • Strong potential for growth and leadership opportunities.
  • Remote-first culture with bi-yearly international off-sites.
  • Opportunities for global conference travel and ecosystem engagement.
  • Health and wellness benefits.

How to Apply

Send us:

  • A short introduction and your background.
  • Examples of infrastructure or platform work you’ve led.
  • Any public write-ups, repositories, or talks (if available).
  • We’re particularly interested in engineers who have built and operated Kubernetes platforms, improved network reliability, and optimized CDN/edge setups such as Cloudflare in production.
Employment Type
On-site

Related Jobs

Other similar jobs that might interest you