Issue #380 - The ML Engineer š¤
Agentic Engineering Practices, Recommendation System at BlueSky, Raschka's Open LLM Dream, Building a MCP Ecosystem at Pinterest, Dual Text-Forecasting Foundation Model + more š
Thank you for being part of over 70,000+ ML professionals and enthusiasts who receive weekly articles & tutorials on Machine Learning & MLOps š¤ You can join the newsletter https://ethical.institute/mle.html ā
If you like the content please support the newsletter by sharing with your friends via āļø Email, š¦ Twitter, š¼ Linkedin and š Facebook!
This week in ML Engineering:
Agentic Engineering Practices
Recommendation System at BlueSky
Raschkaās Open LLM Dream
Building a MCP Ecosystem at Pinterest
Dual Text-Forecasting Foundation Model
Open Source ML Frameworks
Awesome AI Guidelines to check out this week
+ more š
AI coding agents are transforming software faster than we can figure out the best practices, which is why itās super refreshing to see Simon Willisonās latest takes: There is consensus that there is the world before November and the world after; coding agents are now good enough to write substantial amounts of production-relevant code. Simon describes a that his workflow leverages red-green TDD, reusable project templates, and executable validation like running servers and probing APIs, which lets him trust agents more while reducing the need to manually review every line. For production ML practitioners, the important lesson is that agents should be treated as powerful but untrusted collaborators whose output quality depends heavily on the scaffolding (and instructions) around them, especially tests, clear constraints, and consistent codebase patterns. We certainly cannot ignore the security risks that are growing quickly when agents have access to sensitive data, external inputs, and ways to exfiltrate information. It will become more and more important to identify ways to properly use sandboxing, levarage synthetic data, and minimizing permissions where not necessary.
Recommendation System at BlueSky
Super interesting to see the recommendation system design for BlueSky which it seems is based from the Pinterest architecture with a few tweaks: Bluesky chose a recommender system architecture for their Discover-feed personalization which had constraints on data, costs and ML engineering resources. Itās interseting to see that they attempted a two-tower retrieval model but failed to converge, so they fell back to content-based post embeddings using BLIP2 plus topic models and HDBSCAN clusters to build a basic personalization layer. They are now exploring Pinterestās PinnerSage recsys architecture which promises to be a better candidate-generation approach because it keeps item embeddings fixed, avoids heavy fine-tuning, and models users as multiple interest vectors rather than a single embedding. For production ML practitioners, the core takeaway is that there are interesting recsys architectures that have tradeoffs advantages/disadvantages like PinnerSage which offers an operationally attractive way to capture both long-term and short-term user intent by clustering recent interactions, but it shifts complexity downstream because multi-interest user representations are straightforward for ANN retrieval yet awkward and expensive to use in ranking.
Sebastian Raschka has dropped another masterclass on owpen-weight LLM architectures, this time sharing key insights from the Jan-Feb 2026 launches: We are still seeing that there is no single architecture which has emerged as dominant, however the field is clearly converging on a shared set of trends as well as best practices. Some of these global trends include better long-context efficiency, lower KV-cache / latency costs, stronger coding / agentic performance, and more practical quality-per-token tradeoffs. Across the models from this year, the main pattern is the rise of increasingly specialized efficiency techniques such as hybrid attention, sliding-window attention, MLA, sparse attention, and multi-token prediction - especially in large MoE systems like GLM-5, Kimi K2.5, Qwen3.5, and Ling 2.5. The key takeaway for production ML teams is that architecture still matters, but less as a search for one universally best design and more as a way to optimize for serving constraints, context length, throughput, and workload fit.
Building a MCP Ecosystem at Pinterest
The race to make AI agents actually useful in production will be won or lost on platform design, not model hype, and Pinterestās MCP ecosystem learnings shows how much leverage this can have: Pinterest is building an internal platform for production MCP services that feed agent workflows. They established an ecosystem of cloud-hosted MCP servers with a central discovery and governance layer, where they also added a shared deployment path so teams can publish tools without owning all the infrastructure, and integrated these servers into the IDE, chat, and internal AI surfaces engineers already use. The main lesson for production ML practitioners is that MCPs only becomes operationally useful when paired with strong platform controls like registry-based approval, layered authn/authz with user JWTs / service identities, etc.
Dual Text-Forecasting Foundation Model
Forecasting is where machine learning stops being interesting and starts being operationally decisive: better predictions directly shape revenue, inventory, risk, capacity, and planning, and even modest accuracy gains can compound into major business impact at scale. Migas 1.5 presents a pragmatic multimodal forecasting architecture for production settings: instead of training a single end-to-end model over text and time series, it keeps a standard time-series foundation model as the forecasting backbone, uses language models to extract structured contextual signals from text, and then applies a learned correction model to adjust the baseline forecast. The reported results across 86 real-world multimodal datasets suggest that this setup materially improves accuracy over unimodal baselines, especially in short-history or regime-shift scenarios where historical values alone are insufficient, with gains of up to 14.2% MAE reduction. For ML practitioners, the most notable contribution is less the benchmark win itself than the systems pattern it implies: event-aware forecasting can be added modularly to existing pipelines, and scarce aligned text-plus-time-series supervision can be bootstrapped with synthetic annotations generated by LLMs, though teams should still validate carefully for leakage, annotation quality, and robustness to noisy context.
Upcoming MLOps Events
The MLOps ecosystem continues to grow at break-neck speeds, making it ever harder for us as practitioners to stay up to date with relevant developments. A fantsatic way to keep on-top of relevant resources is through the great community and events that the MLOps and Production ML ecosystem offers. This is the reason why we have started curating a list of upcoming events in the space, which are outlined below.
Events we are speaking at this year:
Signals Conference - Sept @ Berlin
World Summit AI Europe - Sept @ Amsterdam
Other relevant events:
KubeCon Europe - March @ Amsterdam
PyData Berlin - April @ Frankfurt
Databricks Summit - June @ San Francisco
World Developer Congress - July @ Berlin
EuroPython 2026 - July @ Prague
EuroSciPy 2026 - July @ Krakow
Code.Talks 2026 - Nov @ Hamburg
MLOps World 2026 - Nov @ Austin
In case you missed our talks, check our recordings below:
The State of AI in 2025 - WeAreDevelopers 2025
Prod Generative AI in 2024 - KubeCon AI Day 2025
The State of AI in 2024 - WeAreDevelopers 2024
Responsible AI Workshop Keynote - NeurIPS 2021
Practical Guide to ML Explainability - PyCon London
ML Monitoring: Outliers, Drift, XAI - PyCon Keynote
Metadata for E2E MLOps - Kubecon NA 2022
ML Performance Evaluation at Scale - KubeCon Eur 2021
Industry Strength LLMs - PyData Global 2022
ML Security Workshop Keynote - NeurIPS 2022
Open Source MLOps Tools
Check out the fast-growing ecosystem of production ML tools & frameworks at the github repository which has reached over 20,000 ā github stars. We are currently looking for more libraries to add - if you know of any that are not listed, please let us know or feel free to add a PR. Hereās a few featured open source libraries that we maintain:
KAOS - K8s Agent Orchestration Service for managing the KAOS in large-scale distributed agentic systems.
Kompute - Blazing fast, lightweight and mobile phone-enabled GPU compute framework optimized for advanced data processing usecases.
Production ML Tools - A curated list of tools to deploy, monitor and optimize machine learning systems at scale.
AI Policy List - A mature list that maps the ecosystem of artificial intelligence guidelines, principles, codes of ethics, standards, regulation and beyond.
Agentic Systems Tools - A new list that aims to map the emerging ecosystem of agentic systems with tools and frameworks for scaling this domain
Please do support some of our open source projects by sharing, contributing or adding a star ā
About us
The Institute for Ethical AI & Machine Learning is a European research centre that carries out world-class research into responsible machine learning.
