Issue #378 - The ML Engineer 🤖

The State of Data Agents, Agentic Engineering Quality, Scaling Python Performance, Hacking McKinsey's Platform, DuckDB Benchmarks in Macbook Neo + more 🚀

Mar 15, 2026

Thank you for being part of over 70,000+ ML professionals and enthusiasts who receive weekly articles & tutorials on Machine Learning & MLOps 🤖 You can join the newsletter https://bit.ly/state-of-ml-2025 ⭐

If you like the content please support the newsletter by sharing with your friends via ✉️ Email, 🐦 Twitter, 💼 Linkedin and 📕 Facebook!

This week in ML Engineering:

The State of Data Agents
Agentic Engineering Quality
Scaling Python Performance
Hacking McKinsey’s Platform
DuckDB Benchmarks in Macbook Neo
Open Source ML Frameworks
Awesome AI Guidelines to check out this week
+ more 🚀

The State of Data Agents

Every organisation is trying to build their own data agents; but what does this even mean? This is a great comprehensive paper that tackles this by standardising terminology and challenges in industry: This paper argues that “data agents” need a clearer framework to define the various flavours in which it’s appearing before the term becomes meaningless. It proposes a six-level taxonomy from L0 manual workflows to L5 fully autonomous; it is interesting to see that there is an emergence of terminology even at the fringes of agentic tooling. For production ML practitioners, the useful takeaway is that most real systems today are still better thought of as assisted tooling or partial-autonomy operators rather than truly autonomous end-to-end agents: they can help with tuning, cleaning, querying, retrieval, reporting, and multi-step analysis, but they still need human-designed workflows, guardrails, and supervision. Like everything else in the agentic fields, this is a rapidly evolving domain so it’s going to be interesting to see how this changes within even this year.

Agentic Engineering Quality

As of today, many of the PRs that coding agents write would not be merged; this is a really interesting finding from a recent study from METR: This research argues that SWE-bench pass rates materially overstate real-world coding usefulness: across 296 AI-generated PRs reviewed by actual maintainers from scikit-learn, Sphinx, and pytest, roughly half of benchmark-passing patches would still not be merged, even after normalizing against human-written “golden” patches to account for reviewer noise. For production ML practitioners, the key takeaway is the reminder that chasing a metric for optimization may not result in improved performance if it’s not directly aligned with the actual objectives. In this case the paper does not claim current agents fundamentally cannot improve with better prompting or iteration, but it does show that benchmark scores alone can mislead teams evaluating coding agents for real software workflows.

Scaling Python Performance

“Python’s performance sucks” - Yes, but... that’s not the end of the story. Can python be fast? Yes: Performance engineering in Python is not a niche concern, so it’s important to be aware of the “optimization ladder” available to us, and which we can activate to gain real performance optimizations. These are some great options that you can use to drive performance gains: 1) Upgrade CPython to gain small performance gains. 2) Compile your typed python with mypyc can deliver strong wins if your code is already typed. 3) Leverage NumPy/JAX to drive massive performance gains with vectorizable array math. 4) You can use Numba to accelerate particularly for numeric loops over arrays. 5) If none of these work, then you can go low level and rebuild core components with Cython/Rust/etc. The most practically useful insight is that realistic pipelines often bottleneck on Python object creation and parsing, not just raw compute, so the biggest gains can come from changing data representations or moving parsing and hot paths out of Python objects entirely. This is a great article on practical Python performance optimizations; it’s often best to go back to the foundations to drive the most value.

Hacking McKinsey’s Platform

AI Agents are making SQL injections ubiquitous again! McKinsey seems to have been the latest victim to agentic vulnerabilities: CodeWall claims its autonomous agent found an unauthenticated SQL injection in McKinsey’s internal AI platform (aka Lilli), and chained it with other weaknesses to gain read/write access to production data, including chat logs, files, user accounts, system prompts, and RAG metadata. This is brutal; a stark reminder that AI platforms inherit classic application security risks while adding new, higher-impact failure modes around prompts, RAG data, and agent workflows. For production ML practitioners, the main takeaway is that securing the model is not enough: the real attack surface spans APIs, document pipelines, vector stores, prompt/config storage, and authorization boundaries. Indeed, let’s not make SQL injections ubiquitous again!

DuckDB Benchmarks in Macbook Neo

We all saw the launch of the Macbook Neo last week; but the real question is: how much can it DuckDB? The answer is of course “Yes”: The DuckDB team did a small benchmark of the entry-level MacBook Neo as a useful reminder for ML practitioners that local analytics performance is increasingly good enough for nontrivial data work, even on constrained hardware. It was surprised to see that DuckDB with tuned memory limits and out-of-core execution performed pretty competitively despite having the same chip as the iphone (and maybe even less RAM than some models?). Of course if your objective is to do local data computation, then don’t get this hardware, but it’s more interesting to think about what is going to be unlocked with more and more capabilities on edge processing; DuckDB really can make low-cost laptops viable for occasional large-scale local analysis, prototyping, and client-side data exploration - and as we know, agents love these.

Upcoming MLOps Events

The MLOps ecosystem continues to grow at break-neck speeds, making it ever harder for us as practitioners to stay up to date with relevant developments. A fantsatic way to keep on-top of relevant resources is through the great community and events that the MLOps and Production ML ecosystem offers. This is the reason why we have started curating a list of upcoming events in the space, which are outlined below.

Events we are speaking at this year:

eTail Europe - March @ Berlin
World Summit AI Europe - September @ Amsterdam

Other relevant events:

KubeCon Europe - March @ Amsterdam
PyData Berlin - April @ Frankfurt
Databricks Summit - June @ San Francisco
World Developer Congress - July @ Berlin
EuroPython 2026 - July @ Prague
EuroSciPy 2026 - July @ Krakow
Code.Talks 2026 - Nov @ Hamburg
MLOps World 2026 - Nov @ Austin

Open Source MLOps Tools

Check out the fast-growing ecosystem of production ML tools & frameworks at the github repository which has reached over 20,000 ⭐ github stars. We are currently looking for more libraries to add - if you know of any that are not listed, please let us know or feel free to add a PR. Here’s a few featured open source libraries that we maintain:

KAOS - K8s Agent Orchestration Service for managing the KAOS in large-scale distributed agentic systems.
Kompute - Blazing fast, lightweight and mobile phone-enabled GPU compute framework optimized for advanced data processing usecases.
Production ML Tools - A curated list of tools to deploy, monitor and optimize machine learning systems at scale.
AI Policy List - A mature list that maps the ecosystem of artificial intelligence guidelines, principles, codes of ethics, standards, regulation and beyond.
Agentic Systems Tools - A new list that aims to map the emerging ecosystem of agentic systems with tools and frameworks for scaling this domain

Please do support some of our open source projects by sharing, contributing or adding a star ⭐

About us

The Institute for Ethical AI & Machine Learning is a European research centre that carries out world-class research into responsible machine learning.

Check out our website

The Machine Learning Engineer

Ready for more?