Issue #391 - The ML Engineer 🤖
Autonomous Agentic Systems, OpenAI Codex Engineering, Anthropic on Self Service Analytics, Amazon's Tabluar Foundation Models, Kimi 2.7 Code Model Release + more 🚀
Thank you for being part of over 70,000+ ML professionals and enthusiasts who receive weekly articles & tutorials on Machine Learning & MLOps 🤖 You can join the newsletter https://bit.ly/state-of-ml-2025 ⭐
If you like the content please support the newsletter by sharing with your friends via ✉️ Email, 🐦 Twitter, 💼 Linkedin and 📕 Facebook!
This week in ML Engineering:
Autonomous Agentic Systems
OpenAI Codex Engineering
Anthropic on Self Service Analytics
Amazon’s Tabluar Foundation Models
Kimi 2.7 Code Model Release
Open Source ML Frameworks
Awesome AI Guidelines to check out this week
+ more 🚀
Excited to see my article on “Autonomous Agentic Systems at Scale” is now published at Hackernoon! This one provides a practical guide to “Always-On” agents 🚀 In this post I share some of my learnings gathered from extending KAOS to support autonomous long-running agents whilst balancing the runtime, memory, telemetry, and operational complexities. “Always-On” Agents feel like a new kind of architectural abstraction, as they are indeed not quite a chatbot, but also not quite a “cron job” / workflow engine, and definitely not quite a microservice... but it’s a Frankenstein that borrows from all of them (+ ofc with the added complexity of being stateful and non-deterministic). Despite the complexity it is clear that there’s a lot of opportunity with this pattern - namely for use-cases where the goal persists over time and the environment keeps changing. However it is also clear to me that there is still quite a way to go for the field to be able to start getting the full value, and that includes improvements in monitoring, operations, maintenance, research, data discovery, etc. So let’s make sure that we continue to invest and contribute to theses open questions, as they indeed won’t answer themselves. Let me know what you think!
OpenAI is clearly leading the way in developer productivity; the Codex repo is built by a team of 3, and has produced about 1.5k PRs over five months, post is one of the clearest signs yet that the coding-agent bottleneck is moving from model capability to engineering systems design: the team built an internal product with 0 manually-written lines of code, around a million lines generated by Codex, and roughly 1,500 PRs over five months. For production ML practitioners, the practical takeaway is that reliable agentic development is less about prompting harder and more about building the right harness around agents: repo-local knowledge, a small AGENTS.md as a map instead of a giant instruction manual, browser-driven validation, local observability, mechanically enforced architecture, custom linters, and continuous cleanup of drift. The key lesson is that as agents take over more of the software lifecycle, human attention becomes the scarce resource, so teams need to encode taste, constraints, tests, telemetry, and review loops directly into the codebase rather than relying on ad-hoc docs or heroic human review.
Anthropic on Self Service Analytics
Anthropic just shared their playbook on self-service analytics, showing how they automated 95% of their internal business analytics with Claude: On their setup, they were able to build an agentic system stack that is able to reach ~95% aggregate accuracy, and the most interesting point is that their system looks much more like a governed data platform, so we’re back to the basics. It seems the main failure modes are data modelling ambiguity, data staleness, and retrieval failure, which can be addressed with canonical datasets, enforced semantic layers, lineage, curated domain docs, and Claude Code Skills that route the model through the same process a senior analyst would follow. It’s also great to see strong emphasis on evals and observability, including offline question/answer suites, PR-level ablations, provenance footers, adversarial review, and correction harvesting. For production ML and Data practitioners it is a reminder that reliable analytics agents are less about letting an LLM loose on your warehouse, and more about the basic foundations of “great data”.
Amazon’s Tabluar Foundation Models
Tabular ML is still one of the most business-critical parts of production machine learning, and I didn’t know Amazon also had a tabular foundation model that tackles different modalities: Amazon Mitra is an interesting tabular foundation models that also takes a different approach from training on real data, and instead it is trained on purely synthetic data, which seems to be a growing trend. This synthetic data pipeline includes a carefully designed mixture of synthetic priors, combining structural causal models with tree-based generators such as gradient boosting, random forests and decision trees. It seems that the key idea for tabular foundation models is that the data prior may matter as much as the architecture, as good synthetic priors should perform well on real tasks, be diverse enough to avoid overfitting to themselves, and add distinctive patterns not already covered by other priors. Mitra uses in-context learning to condition on support rows from a new dataset and predict query labels without gradient updates, while also supporting fine-tuning and ensembling through AutoGluon. The reported results are strong across TabRepo, TabZilla, AMLB and TabArena, with Mitra outperforming TabPFNv2 (although v3 is already out), TabICL and strong task-specific baselines in several classification and regression settings, and showing better sample efficiency when fewer in-context examples are available. It is clear that tabular foundation models are becoming a serious option for low-data, fast-iteration tabular prediction workflows; it will not (yet) outperform specialised models, but it is clear that getting a strong baseline for free is already a major win.
The Chinese startup behind Kimi (Moonshot AI) just released Kimi K2.7 Code, and this is for sure worth paying attention to if you care about where coding agents are headed: K2.7 Code is a coding-focused agentic model built on the previous arch, ie with the same large MoE shape of roughly 1T total parameters and 32B active parameters, but with stronger reported coding / agentic performance and around 30% lower thinking-token usage. That token-efficiency point is probably the most interesting part for production ML teams, because agentic coding is starting to get more expensive (and so is electricity!), which means it gets worse with repeated tool calls, context carry-forward, retries, and multi-step reasoning loops. The model supports a 256K context window, multimodal input, forced thinking / preserve-thinking mode, and deployment through vLLM, SGLang, KTransformers, Hugging Face, Moonshot APIs, and hosted routers like OpenRouter. The benchmark numbers look promising across coding and MCP-style tool-use tasks, although teams should treat them carefully given the first-party eval setup and differences in harnesses across competing models.
Upcoming MLOps Events
The MLOps ecosystem continues to grow at break-neck speeds, making it ever harder for us as practitioners to stay up to date with relevant developments. A fantsatic way to keep on-top of relevant resources is through the great community and events that the MLOps and Production ML ecosystem offers. This is the reason why we have started curating a list of upcoming events in the space, which are outlined below.
Events we are speaking at this year:
Signals Conference - September @ Berlin
World Summit AI Europe - September @ Amsterdam
Other relevant events:
KubeCon Europe - March @ Amsterdam
PyData Berlin - April @ Frankfurt
Databricks Summit - June @ San Francisco
World Developer Congress - July @ Berlin
EuroPython 2026 - July @ Prague
EuroSciPy 2026 - July @ Krakow
AI Infra Summit 2026 - Sept @ California
Code.Talks 2026 - Nov @ Hamburg
MLOps World 2026 - Nov @ Austin
In case you missed our talks, check our recordings below:
The State of AI in 2025 - WeAreDevelopers 2025
Prod Generative AI in 2024 - KubeCon AI Day 2025
The State of AI in 2024 - WeAreDevelopers 2024
Responsible AI Workshop Keynote - NeurIPS 2021
Practical Guide to ML Explainability - PyCon London
ML Monitoring: Outliers, Drift, XAI - PyCon Keynote
Metadata for E2E MLOps - Kubecon NA 2022
ML Performance Evaluation at Scale - KubeCon Eur 2021
Industry Strength LLMs - PyData Global 2022
ML Security Workshop Keynote - NeurIPS 2022
Open Source MLOps Tools
Check out the fast-growing ecosystem of production ML tools & frameworks at the github repository which has reached over 20,000 ⭐ github stars. We are currently looking for more libraries to add - if you know of any that are not listed, please let us know or feel free to add a PR. Here’s a few featured open source libraries that we maintain:
SARC - Provides wrappers for popular agentic frameworks to enable guardrails and constraints that are enforced through the flow.
KAOS - K8s Agent Orchestration Service for managing the KAOS in large-scale distributed agentic systems.
Kompute - Blazing fast, lightweight and mobile phone-enabled GPU compute framework optimized for advanced data processing usecases.
Production ML Tools - A curated list of tools to deploy, monitor and optimize machine learning systems at scale.
AI Policy List - A mature list that maps the ecosystem of artificial intelligence guidelines, principles, codes of ethics, standards, regulation and beyond.
Agentic Systems Tools - A new list that aims to map the emerging ecosystem of agentic systems with tools and frameworks for scaling this domain
Please do support some of our open source projects by sharing, contributing or adding a star ⭐
About us
The Institute for Ethical AI & Machine Learning is a European research centre that carries out world-class research into responsible machine learning.
