Issue #388 - The ML Engineer 🤖

Making DL Go Brr with First Principles, Benedict Evans: AI Eats the World 2026, Gemini 3.5 Frontier Intelligence, (Sk)Forecast Foundation Models, NVIDIA' New Image/Video Model + more 🚀

May 24, 2026

Thank you for being part of over 70,000+ ML professionals and enthusiasts who receive weekly articles & tutorials on Machine Learning & MLOps 🤖 You can join the newsletter https://bit.ly/state-of-ml-2025 ⭐

If you like the content please support the newsletter by sharing with your friends via ✉️ Email, 🐦 Twitter, 💼 Linkedin and 📕 Facebook!

This week in ML Engineering:

Making DL Go Brr with First Principles
Benedict Evans: AI Eats the World 2026
Gemini 3.5 Frontier Intelligence
(Sk)Forecast Foundation Models
NVIDIA’ New Image/Video Model
Open Source ML Frameworks
Awesome AI Guidelines to check out this week
+ more 🚀

Making DL Go Brr w First Principles

A classic, “Deep Learning Go Brrrr From First Principles” which still brings super relevant advice to AI teams today: Instead of throwing random PyTorch tricks at slow models, it’s important to have a clean mental model for diagnosing performance across: 1) compute-bound limits, 2) memory-bandwidth-bound limits, and 3) overhead-bound limits. Each of these brings a different optimization path. Compute-bound workloads need better Tensor Core usage or more hardware. Bandwidth-bound workloads benefit most from operator fusion and avoiding unnecessary global memory reads/writes. Overhead-bound workloads usually need tracing, compilation, CUDA Graphs, or reducing Python/framework dispatch costs. For us production ML practitioners it’s a good reminder that GPU efficiency is not just about bigger accelerators, but about understanding where time is actually spent.

Benedict Evans: AI Eats the World 2026

Benedict Evans has dropped the 2026 “AI Eats the World” deck, and here’s the main highlights: GenAI so far = Huge capex first, unclear value capture, lots of hype, and only later the boring-but-transformational deployment layer. The current model race is still going, but models are converging, infrastructure is getting brutally expensive, and the real leverage is likely to come from teams that turn LLMs into reliable workflow automation. There is still a lot of value to come from new aggregation/discovery layers, and domain-specific products that change how work is done. The most important takeaway is that AI adoption will probably look slow and underwhelming inside enterprises until it suddenly becomes standard / expected.

Gemini 3.5 Frontier Intelligence

Google DeepMind has just released Gemini 3.5 Flash! This is quite interesting to see as a faster agentic execution model across coding, tool use, multimodal understanding and long-horizon workflows. The interesting bit for production ML practitioners is that Google is positioning Flash as the high-throughput model for real-world agents which claims strong results. For ML teams, the takeaway is about the infrastructure pattern where faster frontier models plus agent harnesses are becoming the default winning advantage.

(Sk)Forecast Foundation Models

Skforecast is making time-series foundation models much easier to test in real production forecasting workflows: it’s wiring Chronos, TimesFM, Moirai, and TabICL through their new release! Really great to see Skforecast leading the charge on making foundation models accessible with various new classes (e.g. FoundationModel + ForecasterFoundation) which abstract foundation models on sklearn-style interfaces. Zero-shot forecasting is now becoming something you can benchmark inside existing forecasting pipelines rather than treat as a separate research experiment (and works surprisingly well). There are still challenges in context length and feature parity, as longer windows help models see seasonality and regime patterns, but they also increase inference cost and latency, so teams still need proper backtesting rather than assuming bigger context is better. The examples are also refreshingly honest about production caveats - definitely worth checking out.

NVIDIA’ New Image/Video Model

NVIDIA just dropped a high efficiency open-source stack for high-resolution image, video, and world-model generation! This seems like an exciting addition for production ML teams because it focuses on the deployment constraints that usually decide whether generative media systems are practical on latency, VRAM, training cost, quantization, and serving integration. This seems to be positioned by NVIDIA as a complete training and inference codebase with techniques such as linear attention, 32× DC-AE latent compression, Flow-DPM-Solver sampling, few-step sCM distillation, and block causal linear attention for long video generation.

Upcoming MLOps Events

The MLOps ecosystem continues to grow at break-neck speeds, making it ever harder for us as practitioners to stay up to date with relevant developments. A fantsatic way to keep on-top of relevant resources is through the great community and events that the MLOps and Production ML ecosystem offers. This is the reason why we have started curating a list of upcoming events in the space, which are outlined below.

Events we are speaking at this year:

eTail Europe - March @ Berlin
World Summit AI Europe - September @ Amsterdam

Other relevant events:

KubeCon Europe - March @ Amsterdam
PyData Berlin - April @ Frankfurt
Databricks Summit - June @ San Francisco
World Developer Congress - July @ Berlin
EuroPython 2026 - July @ Prague
EuroSciPy 2026 - July @ Krakow
AI Infra Summit 2026 - Sept @ California
Code.Talks 2026 - Nov @ Hamburg
MLOps World 2026 - Nov @ Austin

In case you missed our talks, check our recordings below:

The State of AI in 2025 - WeAreDevelopers 2025
Prod Generative AI in 2024 - KubeCon AI Day 2025
The State of AI in 2024 - WeAreDevelopers 2024
Responsible AI Workshop Keynote - NeurIPS 2021
Practical Guide to ML Explainability - PyCon London
ML Monitoring: Outliers, Drift, XAI - PyCon Keynote
Metadata for E2E MLOps - Kubecon NA 2022
ML Performance Evaluation at Scale - KubeCon Eur 2021
Industry Strength LLMs - PyData Global 2022
ML Security Workshop Keynote - NeurIPS 2022

Open Source MLOps Tools

Check out the fast-growing ecosystem of production ML tools & frameworks at the github repository which has reached over 20,000 ⭐ github stars. We are currently looking for more libraries to add - if you know of any that are not listed, please let us know or feel free to add a PR. Here’s a few featured open source libraries that we maintain:

SARC - Provides wrappers for popular agentic frameworks to enable guardrails and constraints that are enforced through the flow.
KAOS - K8s Agent Orchestration Service for managing the KAOS in large-scale distributed agentic systems.
Kompute - Blazing fast, lightweight and mobile phone-enabled GPU compute framework optimized for advanced data processing usecases.
Production ML Tools - A curated list of tools to deploy, monitor and optimize machine learning systems at scale.
AI Policy List - A mature list that maps the ecosystem of artificial intelligence guidelines, principles, codes of ethics, standards, regulation and beyond.
Agentic Systems Tools - A new list that aims to map the emerging ecosystem of agentic systems with tools and frameworks for scaling this domain

Please do support some of our open source projects by sharing, contributing or adding a star ⭐

About us

The Institute for Ethical AI & Machine Learning is a European research centre that carries out world-class research into responsible machine learning.

Check out our website

The Machine Learning Engineer

Ready for more?