Issue #379 - The ML Engineer š¤
Scaling Karpathy AutoResearch, Mamba-3 State Space Model, META's No Language Left Behind LLM, Karpathy's No Priors Podcast, AI Redrawing Databases + more š
Thank you for being part of over 70,000+ ML professionals and enthusiasts who receive weekly articles & tutorials on Machine Learning & MLOps š¤ You can join the newsletter https://ethical.institute/mle.html ā
If you like the content please support the newsletter by sharing with your friends via āļø Email, š¦ Twitter, š¼ Linkedin and š Facebook!
This week in ML Engineering:
Scaling Karpathy AutoResearch
Mamba-3 State Space Model
METAās No Language Left Behind LLM
Karpathyās No Priors Podcast
Open Source ML Frameworks
Awesome AI Guidelines to check out this week
+ more š
Data science as a profession is going to change massively in the next few years; the autoresearch method from Karpathy is actually going to have more impact than you think, and this is a great example: Giving an autonomous tuning agent access to parallel GPU infrastructure changes it from a sequential hyperparameter fiddler into a much more capable search system. This is a prime example that wires Karpathyās autoresearch loop to a 16-GPU Kubernetes cluster, where Claude Code is able to run loose, and run 910 experiments over 8 hours, achieving a 2.87% improvement in validation versus baseline and reaching the same quality roughly 9x faster than a single-GPU sequential setup. I canāt emphasise how insane this is, and itās not a gimmick but really the āclaudeā moment for some of the workflows that are carried out in deep learning particularly. In the next few weeks (not even months), we will most likely see iterations of this framework for some of the most popular libraries, and weāll see what most likely will be impressive glimpses of what is even yet to come.
Mamba-3 is live! This is exciting as inference is now a top production concern: the models that win will not just be smartest, but the ones that deliver the best latency, throughput, and cost profile at scale. Mamba-3 reframes state space models around deployment rather than training; it adds a better recurrence, complex-valued state dynamics, and an interesting MIMO variant to improve the quality/latency tradeoff without sacrificing linear-time decoding. For production ML practitioners, the key takeaway is that this pushes SSMs closer to being a practical inference architecture, with Mamba-3 SISO beating Mamba-2, Gated DeltaNet, and even a 1.5B Llama Transformer.
METAās No Language Left Behind LLM
Huge kudos to META for their āNo Language Left Behindā LLM initiative! They built a ML translation system that supports more than 1,600 languages with a particular focus on long-tail and underserved languages! For ML practitioners the main takeaway is that the gains come less from scaling generic models and more from end-to-end system design: broader and cleaner multilingual data pipelines, synthetic data generation, tokenizer/vocabulary expansion, specialized MT training recipes, retrieval-augmented translation, and much stronger evaluation tooling. The paperās most operationally relevant result is the efficiency curve, as the specialized 1Bā8B MT models reportedly match or beat a 70B general LLM baseline on translation quality, which means there is a much better cost/quality tradeoff for production translation workloads. The acceleration that we are seeing for language translation is actually mind blowing, and it is great to see that we are able to also extend these into underserved languages as well.
Karpathyās No Priors Podcast
Andrej Karpathy has predicted the future and impact of agentic systems at every stage; this is one of the best podcasts from him around where he shares the main following insights: Karpathy argues that the practical frontier has shifted from writing code to orchestrating many coding agents, where the key bottleneck is no longer typing or even compute alone, but the operatorās ability to structure tasks, prompts, memory, and evaluation loops. For production ML practitioners, the most important takeaway is that agents already work best in domains with clear objectives and verifiable metrics - software engineering, systems optimization, hyperparameter tuning, kernel work, and experimental search. The near-term opportunity is not vague āAI automationā, but building autonomous loops around measurable workflows. In this podcast he also talks about auto-research, the minigpt project, claw-like systems and more - definitely worth checking out.
The next wave of AI product differentiation will be won or lost in the data layer: the teams that can serve real-time, full-fidelity, agent-ready data cheaply and reliably will ship better AI systems faster, and clickhouse shares a great perspective: ClickHouse argues that AI is not just adding load to existing data systems, but changing the shape of production data workloads altogether; agentic apps, conversational analytics, and AI-driven SRE all require high-concurrency, low-latency access to full-fidelity data, which batch-oriented warehouses and siloed observability stacks handle poorly. For production ML teams, the practical takeaway is that serving AI features reliably now depends on the data plane as much as the model layer, making transactional and analytical systems to need tighter integration, natural-language analytics can fan out into many parallel queries, and LLM/agent observability needs long-retention / granular event data rather than sampled telemetry.
Upcoming MLOps Events
The MLOps ecosystem continues to grow at break-neck speeds, making it ever harder for us as practitioners to stay up to date with relevant developments. A fantsatic way to keep on-top of relevant resources is through the great community and events that the MLOps and Production ML ecosystem offers. This is the reason why we have started curating a list of upcoming events in the space, which are outlined below.
Events we are speaking at this year:
eTail Europe - March @ Berlin
World Summit AI Europe - September @ Amsterdam
Other relevant events:
KubeCon Europe - March @ Amsterdam
PyData Berlin - April @ Frankfurt
Databricks Summit - June @ San Francisco
World Developer Congress - July @ Berlin
EuroPython 2026 - July @ Prague
EuroSciPy 2026 - July @ Krakow
Code.Talks 2026 - Nov @ Hamburg
MLOps World 2026 - Nov @ Austin
In case you missed our talks, check our recordings below:
The State of AI in 2025 - WeAreDevelopers 2025
Prod Generative AI in 2024 - KubeCon AI Day 2025
The State of AI in 2024 - WeAreDevelopers 2024
Responsible AI Workshop Keynote - NeurIPS 2021
Practical Guide to ML Explainability - PyCon London
ML Monitoring: Outliers, Drift, XAI - PyCon Keynote
Metadata for E2E MLOps - Kubecon NA 2022
ML Performance Evaluation at Scale - KubeCon Eur 2021
Industry Strength LLMs - PyData Global 2022
ML Security Workshop Keynote - NeurIPS 2022
Open Source MLOps Tools
Check out the fast-growing ecosystem of production ML tools & frameworks at the github repository which has reached over 20,000 ā github stars. We are currently looking for more libraries to add - if you know of any that are not listed, please let us know or feel free to add a PR. Hereās a few featured open source libraries that we maintain:
KAOS - K8s Agent Orchestration Service for managing the KAOS in large-scale distributed agentic systems.
Kompute - Blazing fast, lightweight and mobile phone-enabled GPU compute framework optimized for advanced data processing usecases.
Production ML Tools - A curated list of tools to deploy, monitor and optimize machine learning systems at scale.
AI Policy List - A mature list that maps the ecosystem of artificial intelligence guidelines, principles, codes of ethics, standards, regulation and beyond.
Agentic Systems Tools - A new list that aims to map the emerging ecosystem of agentic systems with tools and frameworks for scaling this domain
Please do support some of our open source projects by sharing, contributing or adding a star ā
About us
The Institute for Ethical AI & Machine Learning is a European research centre that carries out world-class research into responsible machine learning.
