HQ – NYC
Full time
On-site
Engineering
We’re looking for a Platform Engineer, Applied Evaluations to define and operationalize quality for the agentic systems that power Antimetal’s investigation and automation engine.
This role is core to our product. You’ll own online and offline evaluation pipelines that operate over petabytes of infrastructure data, and shape agent platform abstractions where necessary to ensure our agents are measurable, debuggable, and reliable. You’ll partner closely with platform, product, and research, leveraging quality signals to accelerate iteration across the company.
Antimetal is building the future of infrastructure management. We’re starting by creating a platform that investigates, resolves, and prevents issues—giving engineers their time back to focus on what they do best: building great products.
Own the evaluation stack: Build online and offline eval pipelines that measure agent quality across ephemeral, voluminous MELT data, code, and unstructured docs. Set the metrics that define the experience.
Define quality at scale: Production incidents span hundreds of services–ephemeral, high-volume, and where ground truth is approximative. Design evals that capture trajectory quality, not just final outputs, and validate that your metrics predict real outcomes.
Build platform abstractions for agents: Design core agent architectures and extend internal frameworks (e.g. sub-agents, MCPs, middleware) – that lets product, platform, and research iterate with confidence and ship faster.
Productionize: Own latency, observability, and uptime.
At least 3 years of experience in ML platform engineering, data engineering, or a related role, preferably at a high-growth company.
Prior experience designing evaluation systems where ground truth is noisy, high-volume, and hard to label (e.g. computer vision, deep research pipelines)
Strong system design skills: you think about how data flows through distributed systems and how decisions compound at scale.
Proven ability to write clean, scalable code and strong data modeling skills.
Demonstrated ability to bring ambiguous goals from prototype to production, using data and experimentation to drive product and architectural decisions.
Proficient in Python and Typescript, with experience using common ML libraries and data engineering tools.
Experience with SRE-best practices and modern observability (OTEL, distributed tracing)
Strong on ML fundamentals: classification/regression, clustering, dimensionality reduction, evaluation + error analysis, probabilistic ML
Experience with agent architectures: multi-step reasoning, tool use, context management
Identify as a builder
Are excited to work in-person from our new and spacious office in New York
Love working in a startup environment (experience in a startup or obsession with going zero-to-one)
Enjoy working with people who are ambitious, caring, and think in systems
Thrive in a fast-paced iterative environment where experimentation is essential
Pay & ownership — Competitive salary with generous equity grants.
Full coverage + retirement — Fully covered health, dental, and vision, plus retirement benefits.
Unlimited PTO — Take the time you need to recharge.
Dinner on late nights — Working late? Dinner is on us.
Fitness stipend — Monthly support for your health and wellness.
Tools of the trade — Any equipment you need to do your best work.
Commute perks — Citi Bike + train benefits.
Application Review – Send us your stuff, and a quick note on why you’re excited.
Intro Chat: Share what you’re looking for next and learn more about what we’re building.
Founder Interview: Talk with one of our founders in more detail about the role
Technical Interview: We’ll have you complete a short exercise specific to the role.
Onsite: Come onsite and meet the team through a series of 1:1 interviews.
Decision – We’ll move fast.
Compensation Range: $225K – $325K
Other similar jobs that might interest you