Who We Are:
At Dria, we’re building a distributed, crowdsourced hyperscaler—a movement led by everyday people that unlocks faster, more affordable inference for everyone.
Dria powers scalable, high-performance inference across diverse CPU and GPU platforms. Our mission is to deliver accessible, cutting-edge performance anytime, anywhere.
We’re developing an inference engine optimized for heterogeneous devices, along with an open-source, crowdsourced AI inference SDK tailored for distributed AI workloads.
Our research is focused on delivering high-quality AI for 8 billion unique lives, with an emphasis on compilers, sharding, peer-to-peer networks, CPU/GPU inference, and data generation.
About the Job:
As a Distributed Systems Developer, you will play a critical role in building and optimizing SDKs that power large-scale, distributed AI workloads. You will work with asynchronous frameworks, workflow orchestration APIs, and RPC communication patterns to create scalable and fault-tolerant systems. This role requires deep technical expertise, a strong product mindset, and the ability to thrive in fast-paced environments where ambiguity is part of the challenge.
Key Responsibilities:
Distributed Systems & Service-Oriented Architecture
- Design and implement scalable, fault-tolerant distributed systems with a strong focus on performance and reliability.
- Develop Python SDKs to interface with distributed computing resources and improve developer experience.
- Build and optimize RPC communication patterns for efficient service interactions.
- Work with asynchronous frameworks to ensure efficient and responsive system behavior.
- Leverage workflow orchestration APIs to schedule, manage, and track distributed computation tasks.
Developer-Focused SDK & API Design
- Build intuitive, consistent, and user-friendly SDKs that streamline developer adoption.
- Design APIs that are elegant, well-documented, and scalable, ensuring a seamless developer experience.
- Collaborate with internal teams (engineering, product, DevRel) to shape API standards and best practices.
- Optimize SDK performance for latency, scalability, and efficiency.
Infrastructure & Performance Optimization
- Work with microservice architectures to scale distributed AI workloads efficiently.
- Utilize MongoDB, Redis, and SQL databases to optimize storage and retrieval performance.
- Implement Dockerized services for scalable deployment in AWS cloud environments.
- Ensure security, observability, and maintainability of all distributed components.
Qualifications:
Technical Expertise
- Strong background in distributed systems and service-oriented architectures.
- Proficiency in Python SDK development, with a deep understanding of asynchronous programming.
- Experience with FastAPI, Transformers, MCP, and Ollama frameworks.
- Hands-on expertise in workflow orchestration APIs for distributed task management.
- Solid understanding of MongoDB, Redis, SQL databases, and their role in distributed computing.
- Experience working with AWS, Docker, and microservices in a cloud-native environment.
Non-Technical Skills & Mindset
- Proven ability to take products from 0 to 1 in fast-moving environments.
- Strong problem-solving skills, with the ability to navigate ambiguity and complex technical challenges.
- Passion for intuitive developer experiences, ensuring SDKs and APIs are easy to use and integrate.
- Strong collaboration and communication skills, able to work across engineering, product, and developer teams.
What We Offer:
- Top business contacts.
- Direct cooperation with our founders/managing directors.
- Diverse learning and training opportunities and personal coaching from experienced entrepreneurs.
- Remote/Hybrid working opportunities.
- Flexible working hours.
- A dynamic work ecosystem where you can take the initiative and responsibility.
- Enjoyable team/company activities.
- Working in an international setting.
- A job with Purpose and Meaning!