Part 1 An introduction to machine learning engineering
Iâm sure youâve seen, like most people in the data science field, the statistics on project failures. Based on my experience, the numbers thrown around for a project getting into production (namely, by vendors promising that their tooling stack will improve your chances if you just pay them!) are ridiculously grim. However, some element of truth exists in the hyperbolic numbers that are referenced in the rates of project failure.
Using machine learning (ML) to solve real-world problems is complex. The sheer volume of tooling, algorithms, and activities involved in building a useful model are daunting for many organizations. In my time working as a data scientist and subsequently helping many dozens of companies build useful ML projects, Iâve never seen the tooling or the algorithms be the reason a project fails to provide value to a company.
The vast majority of the time, a project that fails to make its way to production for sustained utility has issues that are rooted in the very early phases. Before even a single line of code is written, before a serving architecture is selected and built out, and long before a decision on scalable training is made, a project is doomed to either cancellation or unused obscurity if planning, scoping, and experimentation are not done properly.
From these early stages of project definition, subject-matter expertise review, and reasonable levels of research and testing validation, a coherent project plan and road map can be built that carries the idea of solving a problem to the phase in which an effective solution can be built. In part 1 of this book, weâll go through blueprints showing how to evaluate, plan, and validate a plan for determining the most likely low-risk solution for a problem by using (or not using!) ML.
1 What is a machine learning engineer?
This chapter covers
- The scope of knowledge and skills for machine learning engineers
- The six fundamental aspects of applied machine learning project work
- The functional purpose of machine learning engineers
Machine learning (ML) is exciting. Itâs fun, challenging, creative, and intellectually stimulating. It also makes money for companies, autonomously tackles overwhelmingly large tasks, and removes the burdensome task of monotonous work from people who would rather be doing something else.
ML is also ludicrously complex. From thousands of algorithms, hundreds of open source packages, and a profession of practitioners required to have a diverse skill set ranging from data engineering (DE) to advanced statistical analysis and visualization, the work required of a professional practitioner of ML is truly intimidating. Adding to that complexity is the need to be able to work cross-functionally with a wide array of specialists, subject-matter experts (SMEs), and business unit groupsâcommunicating and collaborating on both the nature of the problem being solved and the output of the ML-backed solution.
ML engineering applies a system around this staggering level of complexity. It uses a set of standards, tools, processes, and methodology that aims to minimize the chances of abandoned, misguided, or irrelevant work being done in an effort to solve a business problem or need. It, in essence, is the road map to creating ML-based systems that can be not only deployed to production, but also maintained and updated for years in the future, allowing businesses to reap the rewards in efficiency, profitability, and accuracy that ML in general has proven to provide (when done correctly).
This book is, at its essence, that very road map. Itâs a guide to help you navigate the path of developing production-capable ML solutions. Figure 1.1 shows the major elements of ML project work covered throughout this book. Weâll move through these proven sets of processes (mostly a âlessons learnedâ from things Iâve screwed up in my career) to give a framework for solving business problems through the application of ML.
Figure 1.1 The ML engineering road map for project work
This path for project work is not meant to focus solely on the tasks that should be done at each phase. Rather, it is the methodology within each stage (the âwhy are we doing thisâ element) that enables successful project work.
The end goal of ML work is, after all, about solving a problem. The most effective way to solve those business problems that weâre all tasked with as data science (DS) practitioners is to follow a process designed around preventing rework, confusion, and complexity. By embracing the concepts of ML engineering and following the road of effective project work, the end goal of getting a useful modeling solution can be shorter, far cheaper, and have a much higher probability of succeeding than if you just wing it and hope for the best.
1.1 Why ML engineering?
To put it most simply, ML is hard. Itâs even harder to do correctly in the sense of serving relevant predictions, at scale, with reliable frequency. With so many specialties existing in the fieldâsuch as natural language processing (NLP), forecasting, deep learning, and traditional linear and tree-based modelingâan enormous focus on active research, and so many algorithms that have been built to solve specific problems, itâs remarkably challenging to learn even slightly more than an insignificant fraction of all there is to learn about the field. Understanding the theoretical and practical aspects of applied ML is challenging and time-consuming.
However, none of that knowledge helps in building interfaces between the model solution and the outside world. Nor does it help inform development patterns that ensure maintainable and extensible solutions.
Data scientists are also expected to be familiar with additional realms of competency. From mid-level DE skills (you have to get your data for your data science from somewhere, right?), software development skills, project management skills, visualization skills, and presentation skills, the list grows ever longer, and the volumes of experience that need to be gained become rather daunting. Itâs not much of a surprise, considering all of this, that âjust figuring it outâ in reference to all the required skills to create production-grade ML solutions is untenable.
The aim of ML engineering is not to iterate through the lists of skills just mentioned and require that a data scientist (DS) master each of them. Instead, ML engineering collects certain aspects of those skills, carefully crafted to be relevant to data scientists, all with the goal of increasing the chances of getting an ML project into production and making sure that itâs not a solution that needs constant maintenance and intervention to keep running.
ML engineers, after all, donât need to be able to create applications and software frameworks for generic algorithmic use cases. Theyâre also not likely to be writing their own large-scale streaming ingestion extract, transform, and load (ETL) pipelines. They similarly donât need to be able to create detailed and animated frontend visualizations in JavaScript.
ML engineers need to know just enough software development skills to be able to write modular code and implement unit tests. The donât need to know about the intricacies of non-blocking asynchronous messaging brokering. They need just enough data engineering skills to build (and schedule the ETL for) feature datasets for their models, but not to construct a petabyte-scale streaming ingestion framework. They need just enough visualization skills to create plots and charts that communicate clearly what their research and models are doing, but not to develop dynamic web apps that have complex user-experience (UX) components. They also need just enough project management experience to know how to properly define, scope, and control a project to solve a problem, but need not go through a Project Management Professional (PMP) certification.
A giant elephant remains in the room when it comes to ML. Specifically, whyâwith so many companies going all in on ML, hiring massive teams of highly compensated data scientists, and devoting enormous amounts of financial and temporal resources to projectsâdo so many endeavors end up failing? Figure 1.2 depicts rough estimates of what Iâve come to see as the six primary reasons projects fail (and the rates of these failures in any given industry, from my experience, are truly surprising).
Figure 1.2 My estimation of why ML projects fail, from the hundreds Iâve worked on and advised others on
Throughout this first part of the book, weâll discuss how to identify the reasons so many projects fail, are abandoned, or take far longer than they should to reach production. Weâll also discuss the solutions to each of these common failures and cover the processes that can significantly lower the chances of these factors derailing your projects.
Generally, these failures happen because the DS team is either inexperienced with solving a problem of the scale required (a technological or process-driven failure) or hasnât fully understood the desired outcome from the business (a communication-driven failure). Iâve never seen this happen because of malicious intent. Rather, most ML projects are incredibly challenging, complex, and composed of algorithmic software tooling that is hard to explain to a laypersonâhence the breakdowns in communication with business units that most projects endure.
Adding to the complexity of ML projects are two other critical elements that are not shared by (most) traditional software development projects: a frequent lack of detail in project expectations and the relative industry immaturity in tooling. Both aspects are no different from the state of software engineering in the early 1990s. Businesses then were unsure of how to best leverage new aspects of technological capability, tooling was woefully underdeveloped, and many projects failed to meet the expectations of those who were commissioning engineering teams to build them. ML work is (from my biased view of working with only so many companies) at th...