eBook - ePub

Machine Learning Engineering in Action

Name: Machine Learning Engineering in Action
Author: Ben Wilson

Ben Wilson

Share book

576 pages
English
ePUB (mobile friendly)
Available on iOS & Android

eBook - ePub

Machine Learning Engineering in Action

Ben Wilson

Book details

Book preview

Table of contents

Citations

About This Book

Field-tested tips, tricks, and design patterns for building machine learning projects that are deployable, maintainable, and secure from concept to production. In Machine Learning Engineering in Action, you will learn: Evaluating data science problems to find the most effective solution
Scoping a machine learning project for usage expectations and budget
Process techniques that minimize wasted effort and speed up production
Assessing a project using standardized prototyping work and statistical validation
Choosing the right technologies and tools for your project
Making your codebase more understandable, maintainable, and testable
Automating your troubleshooting and logging practices Ferrying a machine learning project from your data science team to your end users is no easy task. Machine Learning Engineering in Action will help you make it simple. Inside, you'll find fantastic advice from veteran industry expert Ben Wilson, Principal Resident Solutions Architect at Databricks. Ben introduces his personal toolbox of techniques for building deployable and maintainable production machine learning systems. You'll learn the importance of Agile methodologies for fast prototyping and conferring with stakeholders, while developing a new appreciation for the importance of planning. Adopting well-established software development standards will help you deliver better code management, and make it easier to test, scale, and even reuse your machine learning code. Every method is explained in a friendly, peer-to-peer style and illustrated with production-ready source code. About the technology
Deliver maximum performance from your models and data. This collection of reproducible techniques will help you build stable data pipelines, efficient application workflows, and maintainable models every time. Based on decades of good software engineering practice, machine learning engineering ensures your ML systems are resilient, adaptable, and perform in production. About the book
Machine Learning Engineering in Action teaches you core principles and practices for designing, building, and delivering successful machine learning projects. You'll discover software engineering techniques like conducting experiments on your prototypes and implementing modular design that result in resilient architectures and consistent cross-team communication. Based on the author's extensive experience, every method in this book has been used to solve real-world projects. What's insideScoping a machine learning project for usage expectations and budget
Choosing the right technologies for your design
Making your codebase more understandable, maintainable, and testable
Automating your troubleshooting and logging practices About the reader
For data scientists who know machine learning and the basics of object-oriented programming. About the author
Ben Wilson is Principal Resident Solutions Architect at Databricks, where he developed the Databricks Labs AutoML project, and is an MLflow committer.

Frequently asked questions

How do I cancel my subscription?

Simply head over to the account section in settings and click on “Cancel Subscription” - it’s as simple as that. After you cancel, your membership will stay active for the remainder of the time you’ve paid for. Learn more here.

Can/how do I download books?

At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.

What is the difference between the pricing plans?

Both plans give you full access to the library and all of Perlego’s features. The only differences are the price and subscription period: With the annual plan you’ll save around 30% compared to 12 months on the monthly plan.

What is Perlego?

We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.

Do you support text-to-speech?

Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.

Is Machine Learning Engineering in Action an online PDF/ePUB?

Yes, you can access Machine Learning Engineering in Action by Ben Wilson in PDF and/or ePUB format, as well as other popular books in Computer Science & Data Processing. We have over one million books available in our catalogue for you to explore.

Information

Publisher

Manning

Year

2022

ISBN

9781638356585

Topic

Computer Science

Subtopic

Data Processing

Index

Computer Science

Part 1 An introduction to machine learning engineering

I’m sure you’ve seen, like most people in the data science field, the statistics on project failures. Based on my experience, the numbers thrown around for a project getting into production (namely, by vendors promising that their tooling stack will improve your chances if you just pay them!) are ridiculously grim. However, some element of truth exists in the hyperbolic numbers that are referenced in the rates of project failure.

Using machine learning (ML) to solve real-world problems is complex. The sheer volume of tooling, algorithms, and activities involved in building a useful model are daunting for many organizations. In my time working as a data scientist and subsequently helping many dozens of companies build useful ML projects, I’ve never seen the tooling or the algorithms be the reason a project fails to provide value to a company.

The vast majority of the time, a project that fails to make its way to production for sustained utility has issues that are rooted in the very early phases. Before even a single line of code is written, before a serving architecture is selected and built out, and long before a decision on scalable training is made, a project is doomed to either cancellation or unused obscurity if planning, scoping, and experimentation are not done properly.

From these early stages of project definition, subject-matter expertise review, and reasonable levels of research and testing validation, a coherent project plan and road map can be built that carries the idea of solving a problem to the phase in which an effective solution can be built. In part 1 of this book, we’ll go through blueprints showing how to evaluate, plan, and validate a plan for determining the most likely low-risk solution for a problem by using (or not using!) ML.

1 What is a machine learning engineer?

This chapter covers

The scope of knowledge and skills for machine learning engineers
The six fundamental aspects of applied machine learning project work
The functional purpose of machine learning engineers

Machine learning (ML) is exciting. It’s fun, challenging, creative, and intellectually stimulating. It also makes money for companies, autonomously tackles overwhelmingly large tasks, and removes the burdensome task of monotonous work from people who would rather be doing something else.

ML is also ludicrously complex. From thousands of algorithms, hundreds of open source packages, and a profession of practitioners required to have a diverse skill set ranging from data engineering (DE) to advanced statistical analysis and visualization, the work required of a professional practitioner of ML is truly intimidating. Adding to that complexity is the need to be able to work cross-functionally with a wide array of specialists, subject-matter experts (SMEs), and business unit groups—communicating and collaborating on both the nature of the problem being solved and the output of the ML-backed solution.

ML engineering applies a system around this staggering level of complexity. It uses a set of standards, tools, processes, and methodology that aims to minimize the chances of abandoned, misguided, or irrelevant work being done in an effort to solve a business problem or need. It, in essence, is the road map to creating ML-based systems that can be not only deployed to production, but also maintained and updated for years in the future, allowing businesses to reap the rewards in efficiency, profitability, and accuracy that ML in general has proven to provide (when done correctly).

This book is, at its essence, that very road map. It’s a guide to help you navigate the path of developing production-capable ML solutions. Figure 1.1 shows the major elements of ML project work covered throughout this book. We’ll move through these proven sets of processes (mostly a “lessons learned” from things I’ve screwed up in my career) to give a framework for solving business problems through the application of ML.

Figure 1.1 The ML engineering road map for project work

This path for project work is not meant to focus solely on the tasks that should be done at each phase. Rather, it is the methodology within each stage (the “why are we doing this” element) that enables successful project work.

The end goal of ML work is, after all, about solving a problem. The most effective way to solve those business problems that we’re all tasked with as data science (DS) practitioners is to follow a process designed around preventing rework, confusion, and complexity. By embracing the concepts of ML engineering and following the road of effective project work, the end goal of getting a useful modeling solution can be shorter, far cheaper, and have a much higher probability of succeeding than if you just wing it and hope for the best.

1.1 Why ML engineering?

To put it most simply, ML is hard. It’s even harder to do correctly in the sense of serving relevant predictions, at scale, with reliable frequency. With so many specialties existing in the field—such as natural language processing (NLP), forecasting, deep learning, and traditional linear and tree-based modeling—an enormous focus on active research, and so many algorithms that have been built to solve specific problems, it’s remarkably challenging to learn even slightly more than an insignificant fraction of all there is to learn about the field. Understanding the theoretical and practical aspects of applied ML is challenging and time-consuming.

However, none of that knowledge helps in building interfaces between the model solution and the outside world. Nor does it help inform development patterns that ensure maintainable and extensible solutions.

Data scientists are also expected to be familiar with additional realms of competency. From mid-level DE skills (you have to get your data for your data science from somewhere, right?), software development skills, project management skills, visualization skills, and presentation skills, the list grows ever longer, and the volumes of experience that need to be gained become rather daunting. It’s not much of a surprise, considering all of this, that “just figuring it out” in reference to all the required skills to create production-grade ML solutions is untenable.

The aim of ML engineering is not to iterate through the lists of skills just mentioned and require that a data scientist (DS) master each of them. Instead, ML engineering collects certain aspects of those skills, carefully crafted to be relevant to data scientists, all with the goal of increasing the chances of getting an ML project into production and making sure that it’s not a solution that needs constant maintenance and intervention to keep running.

ML engineers, after all, don’t need to be able to create applications and software frameworks for generic algorithmic use cases. They’re also not likely to be writing their own large-scale streaming ingestion extract, transform, and load (ETL) pipelines. They similarly don’t need to be able to create detailed and animated frontend visualizations in JavaScript.

ML engineers need to know just enough software development skills to be able to write modular code and implement unit tests. The don’t need to know about the intricacies of non-blocking asynchronous messaging brokering. They need just enough data engineering skills to build (and schedule the ETL for) feature datasets for their models, but not to construct a petabyte-scale streaming ingestion framework. They need just enough visualization skills to create plots and charts that communicate clearly what their research and models are doing, but not to develop dynamic web apps that have complex user-experience (UX) components. They also need just enough project management experience to know how to properly define, scope, and control a project to solve a problem, but need not go through a Project Management Professional (PMP) certification.

A giant elephant remains in the room when it comes to ML. Specifically, why—with so many companies going all in on ML, hiring massive teams of highly compensated data scientists, and devoting enormous amounts of financial and temporal resources to projects—do so many endeavors end up failing? Figure 1.2 depicts rough estimates of what I’ve come to see as the six primary reasons projects fail (and the rates of these failures in any given industry, from my experience, are truly surprising).

Figure 1.2 My estimation of why ML projects fail, from the hundreds I’ve worked on and advised others on

Throughout this first part of the book, we’ll discuss how to identify the reasons so many projects fail, are abandoned, or take far longer than they should to reach production. We’ll also discuss the solutions to each of these common failures and cover the processes that can significantly lower the chances of these factors derailing your projects.

Generally, these failures happen because the DS team is either inexperienced with solving a problem of the scale required (a technological or process-driven failure) or hasn’t fully understood the desired outcome from the business (a communication-driven failure). I’ve never seen this happen because of malicious intent. Rather, most ML projects are incredibly challenging, complex, and composed of algorithmic software tooling that is hard to explain to a layperson—hence the breakdowns in communication with business units that most projects endure.

Adding to the complexity of ML projects are two other critical elements that are not shared by (most) traditional software development projects: a frequent lack of detail in project expectations and the relative industry immaturity in tooling. Both aspects are no different from the state of software engineering in the early 1990s. Businesses then were unsure of how to best leverage new aspects of technological capability, tooling was woefully underdeveloped, and many projects failed to meet the expectations of those who were commissioning engineering teams to build them. ML work is (from my biased view of working with only so many companies) at th...