Issue #374 - The ML Engineer š¤
The DuckDB Agent Data Extension, State of Data Eng Report, New Tabular Foundation Model, Speeding Up LLM Inference, Gemini Deep Think 3 + more š
New project release!
We are thrilled to announce the release a new DuckDB Extension for Querying Agent Data š
Link below released to official DuckDB documentation site š® as well as the Streamlit showcase š
If you want to support the momentum, please do reshare, open an issue, and/or give the repo a star ā
https://duckdb.org/community_extensions/extensions/agent_data š„
This week in ML Engineering:
The DuckDB Agent Data Extension
State of Data Eng Report
New Tabular Foundation Model
Speeding Up LLM Inference
Gemini Deep Think 3
Open Source ML Frameworks
Awesome AI Guidelines to check out this week
+ more š
Excited to announce a new project release! Our DuckDB extension for querying Agent Data, which is now officially part of the DuckDB Community Extensions ššš This means you can load it directly in your duckdb session with:
INSTALL agent_data FROM community;
LOAD agent_data;This has been something Iāve been looking forward for a while, as there is so much you can do with local Agent data from Copilot, Claude, Codex, etc; now you can easily ask any questions such as:
-- How much have I used Claude Code recently?
SELECT date, message_count, tool_call_count
FROM read_stats() ORDER BY date DESC LIMIT 10;
-- Which tools does github copilot use most?
SELECT tool_name, COUNT(*) AS uses
FROM read_conversations('~/.copilot')
GROUP BY tool_name ORDER BY uses DESC; This also has made it quite simple to create interfaces to navigate agent sessions across multiple providers. For this, the repo comes with a simple Marimo example, as well as a Streamlit example that allow you to play around with your local data.
Best thing is that you can do this from the comfort of the proven and tested DuckDB engine without any dependencies. Besides extending to other providers (Gemini, Codex, etc), there are also interesting avenues exploring streaming, and other features.
Check it out - do share feedback and thoughts!
This āState of Data Engineeringā is one of the best reports I have seen, and the interactive charts are one of the best UX Iāve come across - key insights:
cloud data warehouses remain the default (~44%)
lakehouse adoption continues to grow (~27%)
architecture choices vary by org size
individual AI tool usage is now pervasive (82% daily+)
organizational AI maturity still lagging
Some of the top challenges highlighed:
the biggest blockers are organizational
data modeling stands out as a widespread pain point
unclear ownership
long-term maintainability issues
burden of firefighting
This is a great interactive experience, check out the overview, as well as the interactive charts, huge kudos for such a great interactive experience + using DuckDB-WASM!
Inria launches a new tabular foundation model! This space is one of the most exciting areas of āboring MLā, as it could be transformational for key areas like risk, fraud, ops, pricing, forecasting and more: This is a hard problem as tabular datasets are highly heterogeneous; this new foundation model TabICLv2 makes training-free in-context learning practical for real tabular workloads, and one of the things that is still hard to believe is that itās trained mostly in sinthetic data (which seems to be the case for most). From an architecture side it includes a scalable softmax attention temperature scheme to avoid attention degradation as the number of rows grows so it can generalize to much larger tables without having to pretrain on prohibitively long sequences. It also has improved pretraining protocol, showing in benchmarks across TabArena and TALENT that out of the box surpasses various other models. It is quite interesting to see how these models are evolving at fast speed, with various competing models and architectures appearing every couple of months - this is certainly an exciting field to keep an eye on!
Inference speed is becoming the new MOAT, the recent Opus 4.6 fast vs Codex fast is a good example - what is most interesting is the different approaches that orgs are taking to get there:
1) Anthropicās fast mode seems that keeps the same Opus 4.6 model but runs it with much smaller batch sizes: you pay a big premium to avoid queueing/throughput optimization, improving per-user latency/throughput while reducing overall hardware efficiency.
vs 2) OpenAIās fast mode instead achieves an order-of-magnitude token/s jump by serving a different model (Codex-Spark) on Cerebras wafer-scale chips, whose large on-chip SRAM can keep more of the model in fast memory and avoid weight streaming bottlenecks.
The takeaway seems to be that āfastā can mean either premium low-batch serving of the same model (speed via scheduling/efficiency trade) or specialized hardware enabling a smaller model at extreme speed (speed via architecture/model swap). The business question is whether higher token/s actually helps end-to-end developer productivity when error rates and rework dominate, or whether weāll be able to get our cake and eat it and get both. I feel like we are seeing the new CAP table in the making for ML!
Google has released Deep Think 3, which basically takes Deep Research to itās absolute limit, showing performance of 84.5% on ARC (vs ~65 Opus 4.6!), some really exciting insights: Google claims some really huge jumps on hard reasoning benchmarks including 48.4% (no tools) on Humanityās Last Exam, 84.6% on ARC-AGI-2, Codeforces Elo 3455, and gold-medalālevel performance on IMO 2025. From a production ML practitioner lens, it sounds like we can treat this as āreasoning-as-a-componentā that could improve complex analysis, code generation for simulation/modeling, and review/verification workflows. It is really surprising to see how fast Google is taking over all other competitors and really leaving them far behind as this is integrated into their entire cloud and workspace environments, it will be interesting to see how the rest replies.
Upcoming MLOps Events
The MLOps ecosystem continues to grow at break-neck speeds, making it ever harder for us as practitioners to stay up to date with relevant developments. A fantsatic way to keep on-top of relevant resources is through the great community and events that the MLOps and Production ML ecosystem offers. This is the reason why we have started curating a list of upcoming events in the space, which are outlined below.
Events we are speaking at this year:
eTail Europe - March @ Berlin
World Summit AI Europe - September @ Amsterdam
Other relevant events:
KubeCon Europe - March @ Amsterdam
PyData Berlin - April @ Frankfurt
Databricks Summit - June @ San Francisco
World Developer Congress - July @ Berlin
EuroPython 2026 - July @ Prague
EuroSciPy 2026 - July @ Krakow
Code.Talks 2026 - Nov @ Hamburg
MLOps World 2026 - Nov @ Austin
In case you missed our talks, check our recordings below:
The State of AI in 2025 - WeAreDevelopers 2025
Prod Generative AI in 2024 - KubeCon AI Day 2025
The State of AI in 2024 - WeAreDevelopers 2024
Responsible AI Workshop Keynote - NeurIPS 2021
Practical Guide to ML Explainability - PyCon London
ML Monitoring: Outliers, Drift, XAI - PyCon Keynote
Metadata for E2E MLOps - Kubecon NA 2022
ML Performance Evaluation at Scale - KubeCon Eur 2021
Industry Strength LLMs - PyData Global 2022
ML Security Workshop Keynote - NeurIPS 2022
Open Source MLOps Tools
Check out the fast-growing ecosystem of production ML tools & frameworks at the github repository which has reached over 20,000 ā github stars. We are currently looking for more libraries to add - if you know of any that are not listed, please let us know or feel free to add a PR. Hereās a few featured open source libraries that we maintain:
KAOS - K8s Agent Orchestration Service for managing the KAOS in large-scale distributed agentic systems.
Kompute - Blazing fast, lightweight and mobile phone-enabled GPU compute framework optimized for advanced data processing usecases.
Production ML Tools - A curated list of tools to deploy, monitor and optimize machine learning systems at scale.
AI Policy List - A mature list that maps the ecosystem of artificial intelligence guidelines, principles, codes of ethics, standards, regulation and beyond.
Agentic Systems Tools - A new list that aims to map the emerging ecosystem of agentic systems with tools and frameworks for scaling this domain
Please do support some of our open source projects by sharing, contributing or adding a star ā
About us
The Institute for Ethical AI & Machine Learning is a European research centre that carries out world-class research into responsible machine learning.

