Tokka Labs is a proprietary trading firm with a focus on close collaboration, rigorous research, and cutting-edge technology. We are market makers, searchers, and solvers for top protocols on the most popular blockchains in the world. We design and implement our own trading systems and strategies to provide liquidity in the most diverse and challenging environments. At the core of it all lies our unwavering commitment to pushing boundaries of decentralized finance and we are always on the lookout for like-minded individuals to join us on this journey.
We’re looking for a seasoned Data Engineer to be responsible for building and owning our entire real-time data platform from the ground up.
This is not a support role — you’ll architect, implement, and operate the full data stack that powers our HFT market-making strategies, risk systems, research, analytics and AI / ML capabilities. You’ll ingest, process, and serve massive high-frequency time-series datasets (tick-level market data, order books, trades, on-chain events) with sub-second latency requirements.
Design and build scalable ETL/ELT pipelines for real-time streaming of exchange and on-chain data feeds (Binary / Native, SBE, WebSockets, FIX, gRPC, etc.).
Engineer high-performance storage and querying layers using ClickHouse (or equivalent columnar/time-series DBs) optimized for billions of rows of tick and order-book data.
Implement real-time data processing, transformation, and enrichment pipelines (Kafka / Kinesis / Flink or equivalent).
Ensure data quality, observability, backfilling, replay capabilities, and low-latency delivery to trading engines, quant researchers, and risk systems.
Own infrastructure-as-code, monitoring, alerting, and cost optimization across the full data platform.
Collaborate closely with quants, traders, and software engineers to translate trading and research needs into production data infrastructure.
Build and optimize data pipelines for feature engineering and dataset generation from high-frequency time-series data to support machine learning model training and evaluation.
Design real-time data serving layers (e.g., low-latency queries from ClickHouse) to power online inference and live AI-driven trading signals or risk models.
Implement data versioning, lineage, and quality controls to enable reliable, reproducible ML experimentation and production deployment.
7+ years as a data engineer in a high-volume, low-latency environment (HFT/prop trading, fintech, or large-scale real-time analytics strongly preferred).
Deep, production-level expertise with ClickHouse (or similar: TimescaleDB, Druid, Pinot, etc.) for large-scale time-series workloads.
Proficiency with Python, SQL, and data orchestration tools (Airflow, Dagster, or equivalent).
Prior work building data infrastructure for ML/AI (feature stores, model training pipelines, or real-time inference data layers).
Proven track record of owning end-to-end data platforms as a senior individual contributor or small-team lead — you’ve been the “one-man band” before and delivered results under pressure.
You need to be someone who can think on their feet, able to adjust individual priorities to deliver measurable impact to the team.
A genuine interest and passion for Agentic AI Development, Machine Learning, and building Autonomous Workflows.
Be reflective on constructive feedback, share knowledge openly, and be prepared to both learn and unlearn, whilst contributing to transparent decision-making.
Your decision making is evidence based, using data to evaluate success and improve efficiency.
Tokka Labs does not accept unsolicited resumes.
Any form of candidate introduction shared without the prior approval of the talent acquisition team, will be deemed free to contact by Tokka Labs without restriction or liability. No placement fee of any kind will be paid in the event the identified candidate is hired by Tokka Labs.
Other similar jobs that might interest you