Loading...

Senior Software Engineer, Managed Orchestration (Kubernetes)

23 August 2025

Location

San Francisco, CA – US

Employment Type

Full time

Location Type

On-site

Department

Cloud Engineering

Cruose’s mission is to accelerate the abundance of energy and intelligence. We’re crafting the engine that powers a world where people can create ambitiously with AI — without sacrificing scale, speed, or sustainability.

Be part of the AI revolution with sustainable technology at Crusoe. Here, you’ll drive meaningful innovation, make a tangible impact, and join a team that’s setting the pace for responsible, transformative cloud infrastructure.

Overview

Crusoe is building the World’s Favorite AI-first Cloud infrastructure company. We’re pioneering vertically integrated, purpose-built AI infrastructure solutions trusted by Fortune 500 companies to power their most advanced AI applications. Crusoe is redefining AI cloud infrastructure, with a mission to align the future of computing with the future of the climate. Our AI platform is recognized as the “gold standard” for reliability and performance. Our data centers are optimized for AI workloads and are powered by clean, renewable energy.

Be part of the AI revolution with sustainable technology at Crusoe. Here, you’ll drive meaningful innovation, make a tangible impact, and join a team that’s setting the pace for responsible, transformative cloud infrastructure.

About the Role

We are building Crusoe’s next-generation cloud orchestration platform, centered on Kubernetes at scale. As a Senior Software Engineer on the Managed Orchestration team, you’ll design and deliver features that power Crusoe’s managed Kubernetes service, enabling high-performance workloads on CPUs and GPUs in distributed environments.

This role requires deep technical expertise in distributed systems, Kubernetes internals, and modern cloud-native architectures. You’ll work closely with teams across GPUs, Networking, and Storage to build a reliable, scalable, and secure orchestration layer for customers running mission-critical workloads.

What You’ll Do

  • Architect, build, and operate features for Crusoe’s Managed Kubernetes platform (control plane, autoscaling, cluster lifecycle, upgrades, multi-tenancy).

  • Integrate and optimize GPU workloads within Kubernetes clusters, including device plugins, GPU operators, scheduling, and monitoring.

  • Enhance container networking through advanced CNI integration (Cilium, Calico, Multus) and support for high-performance networking (InfiniBand, RoCE).

  • Improve reliability and resilience of Kubernetes clusters, including HA control planes, node lifecycle management, and self-healing mechanisms.

  • Contribute to open-source and internal tooling that enhances observability, automation, and cluster security.

  • Participate in design reviews, provide mentorship to engineers, and help set long-term technical direction.

  • Troubleshoot complex distributed systems problems spanning containers, GPUs, and networking.

What We’re Looking For

  • 5+ years of software engineering experience in distributed systems, cloud, or infrastructure.

  • Deep understanding of Kubernetes internals (control plane, scheduling, operators, controllers, API machinery).

  • Strong proficiency in Go (preferred) or similar languages (Rust, C++, Python for systems work).

  • Experience with container networking (CNI plugins, service mesh, load balancing) and Linux networking fundamentals.

  • Exposure to GPU workloads in Kubernetes (device plugins, GPU operators, scheduling, autoscaling).

  • Familiarity with cloud platforms (AWS, GCP, or Azure) and infrastructure automation (Terraform, Helm, GitOps).

  • Strong debugging and performance optimization skills for distributed systems.

  • Passion for building reliable, developer-friendly platforms that abstract complexity for customers.

Bonus Points

  • Familiarity with NVIDIA and AMD GPUs, device plugins, and operators for GPU lifecycle management.

  • Knowledge of network operators and CNI implementations (Cilium, Calico, Multus).

  • Experience with high-performance networking technologies (InfiniBand, RoCE).

  • Contributions to Kubernetes SIGs, CNCF projects, or related open-source communities.

  • Experience with Slurm, MPI, or HPC-style job schedulers.

  • Familiarity with service meshes (Istio, Linkerd) and multi-cluster networking.

Background in security for containers, GPUs, and Kubernetes (PodSecurity, RBAC, runtime scanning).

Compensation Range:

Compensation will be paid in the range of $166,000 – $204,000. Restricted Stock Units are included in all offers. Compensation to be determined by the applicants knowledge, education, and abilities, as well as internal equity and alignment with market data.

Crusoe is an Equal Opportunity Employer. Employment decisions are made without regard to race, color, religion, disability, genetic information, pregnancy, citizenship, marital status, sex/gender, sexual preference/ orientation, gender identity, age, veteran status, national origin, or any other status protected by law or regulation.

Employment Type
On-site

Related Jobs

Other similar jobs that might interest you