Applied Machine Learning Solutions with Python
eBook - ePub

Applied Machine Learning Solutions with Python

Production-ready ML Projects Using Cutting-edge Libraries and Powerful Statistical Techniques (English Edition)

Siddhanta Bhatta

  1. English
  2. ePUB (adapté aux mobiles)
  3. Disponible sur iOS et Android
eBook - ePub

Applied Machine Learning Solutions with Python

Production-ready ML Projects Using Cutting-edge Libraries and Powerful Statistical Techniques (English Edition)

Siddhanta Bhatta

DĂ©tails du livre
Aperçu du livre
Table des matiĂšres
Citations

À propos de ce livre

A problem-focused guide for tackling industrial machine learning issues with methods and frameworks chosen by experts.

Key Features
? Popular techniques for problem formulation, data collection, and data cleaning in machine learning.
? Comprehensive and useful machine learning tools such as MLFlow, Streamlit, and many more.
? Covers numerous machine learning libraries, including Tensorflow, FastAI, Scikit-Learn, Pandas, and Numpy.

Description
This book discusses how to apply machine learning to real-world problems by utilizing real-world data. In this book, you will investigate data sources, become acquainted with data pipelines, and practice how machine learning works through numerous examples and case studies.The book begins with high-level concepts and implementation (with code!) and progresses towards the real-world of ML systems. It briefly discusses various concepts of Statistics and Linear Algebra. You will learn how to formulate a problem, collect data, build a model, and tune it. You will learn about use cases for data analytics, computer vision, and natural language processing. You will also explore nonlinear architecture, thus enabling you to build models with multiple inputs and outputs. You will get trained on creating a machine learning profile, various machine learning libraries, Statistics, and FAST API.Throughout the book, you will use Python to experiment with machine learning libraries such as Tensorflow, Scikit-learn, Spacy, and FastAI. The book will help train our models on both Kaggle and our datasets.

What you will learn
? Construct a machine learning problem, evaluate the feasibility, and gather and clean data.
? Learn to explore data first, select, and train machine learning models.
? Fine-tune the chosen model, deploy, and monitor it in production.
? Discover popular models for data analytics, computer vision, and Natural Language Processing.

Who this book is for
This book caters to beginners in machine learning, software engineers, and students who want to gain a good understanding of machine learning concepts and create production-ready ML systems. This book assumes you have a beginner-level understanding of Python.

Table of Contents
1. Introduction to Machine Learning
2. Problem Formulation in Machine Learning
3. Data Acquisition and Cleaning
4. Exploratory Data Analysis
5. Model Building and Tuning
6. Taking Our Model into Production
7. Data Analytics Use Case
8. Building a Custom Image Classifier from Scratch
9. Building a News Summarization App Using Transformers
10. Multiple Inputs and Multiple Output Models
11. Contributing to the Community
12. Creating Your Project
13. Crash Course in Numpy, Matplotlib, and Pandas
14. Crash Course in Linear Algebra and Statistics
15. Crash Course in FastAPI

About the Authors
Siddhanta Bhatta is a Machine Learning engineer with 6 years of experience in building machine learning products. He is currently working as a Senior Software Engineer in Data Analytics, Machine Learning, and Deep Learning. He has built multiple data apps in various domains such as vision, NLP, Data Analytics, and many more. He is a Microsoft-certified data scientist who believes in data literacy. LinkedIn Profile: https://www.linkedin.com/in/siddhanta-bhatta-377880a7/
Blog Link: https://joyofunderstanding926957091.wordpress.com/

Foire aux questions

Comment puis-je résilier mon abonnement ?
Il vous suffit de vous rendre dans la section compte dans paramĂštres et de cliquer sur « RĂ©silier l’abonnement ». C’est aussi simple que cela ! Une fois que vous aurez rĂ©siliĂ© votre abonnement, il restera actif pour le reste de la pĂ©riode pour laquelle vous avez payĂ©. DĂ©couvrez-en plus ici.
Puis-je / comment puis-je télécharger des livres ?
Pour le moment, tous nos livres en format ePub adaptĂ©s aux mobiles peuvent ĂȘtre tĂ©lĂ©chargĂ©s via l’application. La plupart de nos PDF sont Ă©galement disponibles en tĂ©lĂ©chargement et les autres seront tĂ©lĂ©chargeables trĂšs prochainement. DĂ©couvrez-en plus ici.
Quelle est la différence entre les formules tarifaires ?
Les deux abonnements vous donnent un accĂšs complet Ă  la bibliothĂšque et Ă  toutes les fonctionnalitĂ©s de Perlego. Les seules diffĂ©rences sont les tarifs ainsi que la pĂ©riode d’abonnement : avec l’abonnement annuel, vous Ă©conomiserez environ 30 % par rapport Ă  12 mois d’abonnement mensuel.
Qu’est-ce que Perlego ?
Nous sommes un service d’abonnement Ă  des ouvrages universitaires en ligne, oĂč vous pouvez accĂ©der Ă  toute une bibliothĂšque pour un prix infĂ©rieur Ă  celui d’un seul livre par mois. Avec plus d’un million de livres sur plus de 1 000 sujets, nous avons ce qu’il vous faut ! DĂ©couvrez-en plus ici.
Prenez-vous en charge la synthÚse vocale ?
Recherchez le symbole Écouter sur votre prochain livre pour voir si vous pouvez l’écouter. L’outil Écouter lit le texte Ă  haute voix pour vous, en surlignant le passage qui est en cours de lecture. Vous pouvez le mettre sur pause, l’accĂ©lĂ©rer ou le ralentir. DĂ©couvrez-en plus ici.
Est-ce que Applied Machine Learning Solutions with Python est un PDF/ePUB en ligne ?
Oui, vous pouvez accĂ©der Ă  Applied Machine Learning Solutions with Python par Siddhanta Bhatta en format PDF et/ou ePUB. Nous disposons de plus d’un million d’ouvrages Ă  dĂ©couvrir dans notre catalogue.

Informations

Année
2021
ISBN
9789391030438

CHAPTER 1

Introduction to Machine Learning

Machine learning is not just part of sci-fi movies anymore. It is already here. Not the kind that is shown in movies, though; our models will not plot against humanity because they found humans are destroying themselves, like in the movie iRobot. At least not yet. But what exactly is machine learning? How does it work? How can a piece of code learn? Where does it end? What are its capabilities, and how can we use it? We will go through these questions and why you want to use them too.
This chapter will also introduce many machine learning jargons and give a bird's eye view of machine learning types and landscape through an example. We will learn about supervised learning, unsupervised machine learning, and semi-supervised machine learning. We will go through few modern examples and learn how those work to understand how important machine learning is and its capabilities. We will go through Gmail Smart compose feature and Netflix recommendation. We will go through some of the key ideas and not the exact implementation to give you a feel and admiration of machine learning. This chapter will not contain any code, only basic motivation, and ideas.
In the end, we will go through the skill set required to do machine learning in the industry effectively. We will go through some of the Python programming concepts quickly as this book assumes you have working Python knowledge. So, many of you might feel it's inadequate, and there is a lot to Python programming than what is listed here. And that is true. So, I would recommend going through the prerequisite section in the book's preface to get a better Python programming understanding. With that in mind, let us get started!

Structure

In this chapter, we will cover the following topics:
  • What is machine learning?
  • Some machine learning jargons
  • Machine learning definitions
  • Why should we use it?
  • Types of machine learning
  • How much do I need to know to do machine learning?

Objective

After studying this chapter, you should be able to:
  • Understand what machine learning is
  • Get familiar with some machine learning jargons
  • Understand the importance of machine learning in the current time

1.1 What is machine learning?

The way we approach problems in traditional software engineering is as follows:
  • We are given a problem. We want to output and are given some constraints.
  • Then, we think about taking the input and writing rules to get the output (keeping the constraints satisfied).
  • Then, we test our code, find errors, modify our rules, or update the understanding of the problem. In some problems, we change the problem a bit or add new inputs. Thus, it is a flexible approach.
The following is a short diagram of traditional programming:
Figure 1.1: Traditional programming
This works out in most cases, and saying this works out is an understatement. We have been solving problems using this approach long before machine learning even came into existence. And it's hardly representative of what programming is by putting a simple diagram of input and output. But for the discussion of how machine learning is different from traditional programming, this definition will suffice. We wanted to emphasize this since many people (I included sometime back) think machine learning is a swiss army knife, and I find people trying to solve problems that are not fit for machine learning. This way of approaching problems is dangerous. On the other hand, traditional programming, done right, can also feel like magic. The beauty is in the flexibility.
Keeping all these in mind, let's understand what machine learning is in the context of traditional programming. Let us think of this with an example. Let's take one of the most used examples in machine learning: a spam filter or classifier. A spam filter should filter out those pesky spam emails from ham (we call the non-spam emails ham). Now, let's do it traditionally. We can analyze the problem and think of inputs first.
Figure 1.2: Spam vs. ham classifier
How do we categorize a spam email? We can start with a few spam and ham emails. What immediately comes to mind is product promotion emails. And let us think about what input we have; we have emails. They have a lot of information like from, subject, body, and so on. So, we can take those as input; let's say we know all the emails are spam when it comes from a certain email id (to as input). Then we can write a rule, reading the email, parsing To, and then check who sent that email. If it matches with the spammer, then mark it as spam else ham. Voila!
But wait, what if one more email address is also a spammer. What about emails where spammer sometimes sends hams. And we can know that only by reading the content, or, like in the following image, an email from a good domain contains spam in the body. Again, we can create some complex logic to handle it (check for certain keywords, and if those are present, then spam else ham). But when shall we stop? We don't know all the emails. We can't possibly know how all the spam or ham emails look like. We only have a dataset containing some spams and some hams.
Figure 1.3: Limitation of traditional programming in data-driven problem
What we can do is, mimic what we do through a model. And a lot of you might think that's not possible. A model like this will try to mimic human intelligence, which is so hard to comprehend. But wait, we don't want to do that to solve the problem of spam and ham. And we don't even want to be 100% correct. Let's say, out of 100 spam emails, our model can filter 80 spam, and 20 are marked as ham even if they are spam. We call those false negatives**. They are falsely identified as ham (negative) even if they are spam (positive). And it's better to have 20 spam emails to delete than 100. That's where metrics come in; we set expectations in the user's mind, and we try to optimize that. Optimization is a huge part of machine learning.
With all this, do you understand the difference in the approach? I never said what the model is. In machine learning, well define a crude** model and feed a bunch of data. Then we try and minimize the metric, and the crude model auto-adjusts/learns** to do that. This is a beginner example of machine learning. There are many other types. But it's a start.
The following is a diagram of how machine learning works in the crudest sense (we will update this diagram eventually):
Figure 1.4: A simple machine learning process
Note: ** It's not illegal to call it false positives. We can take ham (positive) and spam (negatives). But in general, we take data of interest as positive, which in our case is spam emails. There are other reasons too, but we will discuss more on it later.
**We will deep dive into what crude means here, but crude means general here. A model that can be used for a lot of similar problems.
**Some machine learning jargons use 'learn' instead of 'auto adjusts'; I find auto adjust closer to what we do. Although it's close to how learning works, I find it ludicrous to compare it to human learning. But we will use learn from now on since standards are important.

1.2 Some machine learning jargons

The preceding spam ham problem gave us few machine learning jargon; throughout this book, we will go through many jargons, which will make your life easier in studying machine learning. These jargons are something every data scientist/machine learning engineer should know.
Remember, we learned machine learning model auto-adjusts to input data for your specific output. We call that input data a "training set." This is the data that our model or algorithm uses to tweak the knobs and dials in the model to get the desired output. We call those knobs and dials as "parameters" of a model. And such a model is aptly called the "parametric model." And there are models where there are no knobs or dials. We call those "non-parametric models." We will go through some non-parametric models in the upcoming chapters.
Since we need to convince the user (in our problem context, the user is the email user) that our model is useful, we need to give them a "metric" on which they can judge. In our case, we talked about false negatives. Maybe the user is also interested in how many of our models got right and wrong. We can use a metric called "accuracy," defined by how many data points our model correctly identified (spam as spam, ham as ham) by the real examples.
But wait, if we show this accuracy for the training set, then our model can cheat to memorize. There is nothing wrong if you know all your future emails will match exactly with your dataset, which is impossible. So, we might keep some data on which we don't tweak the parameters or train the model. We call that part of data a "test set." Now additionally, it's called a "validation set" or "hold-out set," too. But there is a subtle difference between a test set and a validation set. We will go through that when we learn about hyperparameter tuning. All of these may sound like alien words for a beginner in machine learning. But trust me, they are simple ideas that work.
Now in this current example, we need to tweak the model to give us the output we want. And since we are not relying on rules, we need to tell our model how it performs when tweaking the parameters. That's where "loss function" comes into the picture. We start with an initial set of parameters and output; then, we compare it with our desired output through a loss function. Then, we will try and minimize that loss function by changing the parameters.
What is the desired output in our example? How to get that? Our example calling our desired output for spam emails is 'spam' and 'ham' vice versa. So, someone needs to collect some historical emails and tag them manually for desired output to learn the machine learning model. This process is called "annotation." This type of machine learning is called "supervised machine learning," where you need to tell the model what you want apart from the input. Many of you might think this is useless as you need to tell the model earlier, but it is not; the way machine learning works is called "representation learning," the model understands the hidden rules/representations that cause the input to produce the input-output. So even if new emails come, which our model is not trained with, it will classify the email.
The actual output is dirty (because of noise), and the good model will be robust against noise. For example, our spam filter problem can be noisy during the annotation process; the annotator reads a spam email and marks it as ham due to human error. There are many ways noise can occur in the dataset. We will learn about those in detail.

1.3 Machine learning definition

I love t...

Table des matiĂšres

  1. Cover Page
  2. Title Page
  3. Copyright Page
  4. Dedication Page
  5. About the Author
  6. About the Reviewers
  7. Acknowledgement
  8. Preface
  9. Errata
  10. Table of Contents
  11. 1. Introduction to Machine Learning
  12. 2. Problem Formulation in Machine Learning
  13. 3. Data Acquisition and Cleaning
  14. 4. Exploratory Data Analysis
  15. 5. Model Building and Tuning
  16. 6. Taking Models into Production
  17. 7. Data Analytics Use Case
  18. 8. Building a Custom Image Classifier from Scratch
  19. 9. Building a News Summarization App Using Transformers
  20. 10. Multiple Inputs and Multiple Output Models
  21. 11. Contributing to the Community
  22. 12. Creating Your Own Project
  23. 13. Crash Course in Numpy, Matplotlib, and Pandas
  24. 14. Crash Course in Linear Algebra and Statistics
  25. 15. Crash Course in FastAPI
  26. Index
Normes de citation pour Applied Machine Learning Solutions with Python

APA 6 Citation

Bhatta, S. (2021). Applied Machine Learning Solutions with Python ([edition unavailable]). BPB Publications. Retrieved from https://www.perlego.com/book/2905412/applied-machine-learning-solutions-with-python-productionready-ml-projects-using-cuttingedge-libraries-and-powerful-statistical-techniques-english-edition-pdf (Original work published 2021)

Chicago Citation

Bhatta, Siddhanta. (2021) 2021. Applied Machine Learning Solutions with Python. [Edition unavailable]. BPB Publications. https://www.perlego.com/book/2905412/applied-machine-learning-solutions-with-python-productionready-ml-projects-using-cuttingedge-libraries-and-powerful-statistical-techniques-english-edition-pdf.

Harvard Citation

Bhatta, S. (2021) Applied Machine Learning Solutions with Python. [edition unavailable]. BPB Publications. Available at: https://www.perlego.com/book/2905412/applied-machine-learning-solutions-with-python-productionready-ml-projects-using-cuttingedge-libraries-and-powerful-statistical-techniques-english-edition-pdf (Accessed: 15 October 2022).

MLA 7 Citation

Bhatta, Siddhanta. Applied Machine Learning Solutions with Python. [edition unavailable]. BPB Publications, 2021. Web. 15 Oct. 2022.