Loading...

Applied AI Engineer (Inference & Personalization)

21 October 2025

Applied AI Engineer (Inference & Personalization)

Empty
Empty

2 more properties

Why join koodos labs?

You should join koodos labs for the people and our mission.

We aspire to build the best team of the 2020s. Just like PayPal in the 90s, Google in the 00s, and Stripe in the 10s, we want to be known as “a place where it’s good to be from.” If you join us, we promise to be the best place to grow your career — with the best people you’ve ever worked with. We’re also working on a very bold mission.

We care deeply about helping people connect more deeply with themselves and others. And in the spirit of personal empowerment, we aim to invert the internet’s data model over time and put individuals in control of their digital identities.

Read more about working at koodos labs here.

What we’re building

We’re the Context Company. Context is what turns generic AI into personalized intelligence. It’s all the signals available to a product: environmental cues, device state, user history, intent. Context is what transforms a cold machine into something that knows and understands you.

koodos is building Shelf, a user-controlled data store used several times a week by millions of people to track what they’re consuming and keep up with what others are into. It connects to any platform, aggregates your credentials and consumption data, learns from your activity, and, eventually, will let you share that context with services for more personalized experiences. koodos labs lets you store your digital life in one place — then take it anywhere.

The future is personalized experiences powered by user-controlled context. We’re building a future where apps and services come to you and your memory—not the other way around. See our latest paper here.

The Opportunity

Personalization effects are the new network effects. The better an experience understands and adapts to you, the more irreplaceable it becomes. We need someone who can make that adaptation happen in milliseconds, not minutes. You’ll build the inference and personalization systems that turn rich context into magical experiences — making every interaction feel impossibly personal, instantly.

In this role you will:

Build Personalization for Shelf: Ship the recommendation systems, preference prediction, and discovery features that make Shelf feel magical.

Optimize Inference at Scale: Build model serving infrastructure that delivers personalized experiences in <50ms. Implement caching strategies, design feature stores, and create inference pipelines that handle millions of users without burning compute. Balance sophistication with speed — magic can’t be slow.

Transform Signals into Predictions: Design real-time feature engineering from our memory stores. Take raw consumption signals (what users watch, read, listen to, etc.) and build inference systems that predict preferences, detect taste shifts, and understand cross-domain interests. Turn behavioral data into actionable intelligence.

Create Feedback Loops: Implement multi-armed bandits for explore/exploit in recommendations. Make personalization improve continuously.

Enable Instant Onboarding: Design APIs and SDKs that let any app instantly personalize based on user context. Build the inference layer that makes “cold starts” impossible — every new app already knows what users love.

Optimize Across the Stack: Implement hybrid local/cloud inference strategies. Push intelligence to the edge when it makes sense, centralize when scale demands it. Build systems that gracefully degrade from 100M token contexts to mobile constraints.

Partner with Memory Systems: Work hand-in-hand with our Eng team to define the interface between memory and application. Together, you’ll ensure that rich context translates into immediate value for users.

Measure and Iterate: Build A/B testing infrastructure for personalization strategies. Implement feedback loops that improve recommendations over time. Create metrics that capture the difference between “correct” and “delightful.”

Own Production Excellence: Full ownership of inference infrastructure — from model deployment to monitoring, from capacity planning to on-call. You’ll ensure personalization never sleeps.

We’re early stage, so you’ll have an outsized equity stake and an unparalleled opportunity to define how AI serves humanity. Engineers at koodos labs decide what gets built and why, in addition to figuring out how.

We’re NY-based (we have our own office in West Soho) and we work in-person.

Our Team

We’re a small, but mighty team with backgrounds at YouTube, Coinbase, Harvard, and Cambridge, as startup founders and as early members at companies like Improbable and Lyft. We’ve come together around a shared vision and are dedicated to creating important and positive experiences for cyberspace.

We’re well resourced (unannounced rounds) & backed by top-tier investors, including the backers of companies like Airbnb, Pinterest, Snap, and Twitter, as well as the founders of companies like Zynga, VSCO, and Scale and the people behind artists like Miley Cyrus, Justin Bieber, Lorde, Logic, and Panic! At The Disco, and many others. Our team is advised by the founders of Pinterest, Dubsmash (now Reddit), as well as pioneers of digital marketing and market design from Harvard.

We care about building a genuinely diverse team. We are a majority-first gen immigrant team and sponsor visas — we think that’s important as we build towards enabling easier digital migration. We share the same values of individuality, empathy, reliability, kindness and humility. One big overlap among our life experiences is contrasts: contrasts between our own upbringing and the world around us, contrasts between what was expected of us and what we ended up pursuing and our bringing together of contrasting, interdisciplinary worlds.

More about us here.

Ideal Candidate

We recognize that a confidence gap might discourage amazing candidates from applying. Every job description is a wish list, so please reach out if this role really excites you.

We’re looking for someone who can bridge cutting-edge ML with real-world impact. You might be a great fit if you are:

Production-Ready ML Builder: You’ve deployed ML systems at scale, serving millions. Skilled in PyTorch/TensorFlow and model optimization (quantization, distillation, pruning) using inference frameworks like TorchServe, Triton, and ONNX.

Latency Hacker: You’ve built ultra-fast systems (<100ms). You know how to use caching, batching, precompute vs. on-demand, and high-performance serving (gRPC, protobufs) to support millions of QPS.

Personalization Architect: You’ve built recommendation engines that feel alive. Beyond collaborative filtering, you understand contextual bandits, explore/exploit trade-offs, and real-time signals. You know what makes Netflix’s experience different from YouTube’s — and could build either.

Adaptable Learner: You keep pace with fast-moving ML — from RecSys to LLM-powered personalization. You can separate hype from substance and quickly evaluate new techniques.

Edge & Cloud Strategist: You’ve balanced on-device vs. server-side inference, optimizing for latency, reliability, and even battery life.

Pragmatic Shipper: You bias toward action. You know when to fine-tune, when to leverage pre-trained models, and when “good-enough” wins. You measure success by user delight, not just academic metrics.

Infra-Savvy Engineer: Comfortable with ML infra: feature stores, model registries, experiment tracking, and A/B testing at scale. You know how to version models, ship safely, and roll back when needed.

Clear Communicator: You can translate complex ML systems into product realities, making your work understandable to teammates and stakeholders.

Startup-Ready: You thrive in ambiguity and can go 0→1. You’re comfortable wearing many hats — from debugging CUDA kernels to shaping user-facing APIs.

What you’ll do

Day 1: You’ll merge your first PR. We want our developer experience to be as smooth as possible, so this first day is a good test of how we’re doing there.

Day 7: You’ll ship your first inference optimization to production. Maybe you’ll cut latency by 30% with better caching, or implement request batching that doubles throughput. You’ll experience our full deployment pipeline and see real users benefit from your work.

Day 50: You’ll have shipped personalization that makes users say “how did it know?” You’ll have built the first version of our real-time recommendation system, integrated with our memory architecture, and created an API that third-party developers are excited to use. You’ll have established strong collaboration patterns with our Memory Engineer, shipping features that showcase intelligent context in action.

In the Future: You’ll define how the world experiences personalized AI. As our Memory Engineer expands what we can remember about users (with their permission), you’ll ensure that knowledge translates into delightful, instant experiences. You’ll build the inference infrastructure that makes “context liquidity” real — where any app can instantly understand and serve users perfectly. Your systems will prove that personalization effects are indeed the new network effects.

How we interview

We aim to move fast — typically two weeks end-to-end. We find this works best for both us and you!

And we focus on real-world experience, demonstrated ability, and references. No riddles or binary tree puzzles — just a thoughtful look at what you’ve done and where you’re headed.

Introduction: We’ll kick things off with a call with one of our co-founders.

Technical Screening: A conversation with our CTO to dig into your experience with ML systems, inference optimization, and personalization. We’ll discuss your thoughts on the evolution from traditional RecSys to context-aware AI.

Take-Home Exercise: A short presentation (video or written) on either: (1) a personalization system you’ve built and its impact, (2) your approach to serving LLMs at scale with <100ms latency, or (3) how you’d design instant personalization APIs for third-party developers.

Onsite: You’ll meet the team. We spread this across two days to reduce load:

Day 1: Systems design deep dive — build a personalization system that scales to millions while maintaining <50ms latency

Day 2: Lunch with the team, behavioral interview, and a collaborative session designing the interface between memory and inference systems (you’ll work through this with the same approach you’ll use with our Memory Engineer!)

References: We’ll ask for 2–3 references, and may also reach out to others you’ve worked with (let us know if there are any sensitivities). We’ll keep it brief and respectful — usually ~15 mins per call. You’re also welcome to reference check us.

Decision: We’ll move quickly on our end.

How to apply

If interested, please drop us a line on joinus@koodos.com with your resume and your thoughts on how personalization could work for Shelf.

FAQs

Where will I work?

What tech stack do you currently use for ML/inference?

What’s unique about personalization at koodos?

Are you hiring interns?

Are you open to part-time?

Where can I find more info?

Do you sponsor visas?

Employment Type
On-site

Related Jobs

Other similar jobs that might interest you