Ensemble Machine Learning Cookbook
eBook - ePub

Ensemble Machine Learning Cookbook

Over 35 practical recipes to explore ensemble machine learning techniques using Python

Dipayan Sarkar, Vijayalakshmi Natarajan

  1. 336 pages
  2. English
  3. ePUB (adapté aux mobiles)
  4. Disponible sur iOS et Android
eBook - ePub

Ensemble Machine Learning Cookbook

Over 35 practical recipes to explore ensemble machine learning techniques using Python

Dipayan Sarkar, Vijayalakshmi Natarajan

DĂ©tails du livre
Aperçu du livre
Table des matiĂšres
Citations

À propos de ce livre

Implement machine learning algorithms to build ensemble models using Keras, H2O, Scikit-Learn, Pandas and more

Key Features

  • Apply popular machine learning algorithms using a recipe-based approach
  • Implement boosting, bagging, and stacking ensemble methods to improve machine learning models
  • Discover real-world ensemble applications and encounter complex challenges in Kaggle competitions

Book Description

Ensemble modeling is an approach used to improve the performance of machine learning models. It combines two or more similar or dissimilar machine learning algorithms to deliver superior intellectual powers. This book will help you to implement popular machine learning algorithms to cover different paradigms of ensemble machine learning such as boosting, bagging, and stacking.

The Ensemble Machine Learning Cookbook will start by getting you acquainted with the basics of ensemble techniques and exploratory data analysis. You'll then learn to implement tasks related to statistical and machine learning algorithms to understand the ensemble of multiple heterogeneous algorithms. It will also ensure that you don't miss out on key topics, such as like resampling methods. As you progress, you'll get a better understanding of bagging, boosting, stacking, and working with the Random Forest algorithm using real-world examples. The book will highlight how these ensemble methods use multiple models to improve machine learning results, as compared to a single model. In the concluding chapters, you'll delve into advanced ensemble models using neural networks, natural language processing, and more. You'll also be able to implement models such as fraud detection, text categorization, and sentiment analysis.

By the end of this book, you'll be able to harness ensemble techniques and the working mechanisms of machine learning algorithms to build intelligent models using individual recipes.

What you will learn

  • Understand how to use machine learning algorithms for regression and classification problems
  • Implement ensemble techniques such as averaging, weighted averaging, and max-voting
  • Get to grips with advanced ensemble methods, such as bootstrapping, bagging, and stacking
  • Use Random Forest for tasks such as classification and regression
  • Implement an ensemble of homogeneous and heterogeneous machine learning algorithms
  • Learn and implement various boosting techniques, such as AdaBoost, Gradient Boosting Machine, and XGBoost

Who this book is for

This book is designed for data scientists, machine learning developers, and deep learning enthusiasts who want to delve into machine learning algorithms to build powerful ensemble models. Working knowledge of Python programming and basic statistics is a must to help you grasp the concepts in the book.

Foire aux questions

Comment puis-je résilier mon abonnement ?
Il vous suffit de vous rendre dans la section compte dans paramĂštres et de cliquer sur « RĂ©silier l’abonnement ». C’est aussi simple que cela ! Une fois que vous aurez rĂ©siliĂ© votre abonnement, il restera actif pour le reste de la pĂ©riode pour laquelle vous avez payĂ©. DĂ©couvrez-en plus ici.
Puis-je / comment puis-je télécharger des livres ?
Pour le moment, tous nos livres en format ePub adaptĂ©s aux mobiles peuvent ĂȘtre tĂ©lĂ©chargĂ©s via l’application. La plupart de nos PDF sont Ă©galement disponibles en tĂ©lĂ©chargement et les autres seront tĂ©lĂ©chargeables trĂšs prochainement. DĂ©couvrez-en plus ici.
Quelle est la différence entre les formules tarifaires ?
Les deux abonnements vous donnent un accĂšs complet Ă  la bibliothĂšque et Ă  toutes les fonctionnalitĂ©s de Perlego. Les seules diffĂ©rences sont les tarifs ainsi que la pĂ©riode d’abonnement : avec l’abonnement annuel, vous Ă©conomiserez environ 30 % par rapport Ă  12 mois d’abonnement mensuel.
Qu’est-ce que Perlego ?
Nous sommes un service d’abonnement Ă  des ouvrages universitaires en ligne, oĂč vous pouvez accĂ©der Ă  toute une bibliothĂšque pour un prix infĂ©rieur Ă  celui d’un seul livre par mois. Avec plus d’un million de livres sur plus de 1 000 sujets, nous avons ce qu’il vous faut ! DĂ©couvrez-en plus ici.
Prenez-vous en charge la synthÚse vocale ?
Recherchez le symbole Écouter sur votre prochain livre pour voir si vous pouvez l’écouter. L’outil Écouter lit le texte Ă  haute voix pour vous, en surlignant le passage qui est en cours de lecture. Vous pouvez le mettre sur pause, l’accĂ©lĂ©rer ou le ralentir. DĂ©couvrez-en plus ici.
Est-ce que Ensemble Machine Learning Cookbook est un PDF/ePUB en ligne ?
Oui, vous pouvez accĂ©der Ă  Ensemble Machine Learning Cookbook par Dipayan Sarkar, Vijayalakshmi Natarajan en format PDF et/ou ePUB ainsi qu’à d’autres livres populaires dans Ciencia de la computaciĂłn et Inteligencia artificial (IA) y semĂĄntica. Nous disposons de plus d’un million d’ouvrages Ă  dĂ©couvrir dans notre catalogue.

Informations

Statistical and Machine Learning Algorithms

In this chapter, we will cover the following recipes:
  • Multiple linear regression
  • Logistic regression
  • Naive Bayes
  • Decision trees
  • Support vector machines

Technical requirements

The technical requirements for this chapter remain the same as those we detailed in Chapter 1, Get Closer to Your Data.
Visit the GitHub repository to get the dataset and the code. These are arranged by chapter and by the name of the topic. For the linear regression dataset and code, for example, visit .../Chapter 3/Linear regression.

Multiple linear regression

Multiple linear regression is a technique used to train a linear model, that assumes that there are linear relationships between multiple predictor variables (
) and a continuous target variable (
). The general equation for a multiple linear regression with m predictor variables is as follows:
Training a linear regression model involves estimating the values of the coefficients for each of the predictor variables denoted by the letter
. In the preceding equation,
denotes an error term, which is normally distributed, and has zero mean and constant variance. This is represented as follows:
Various techniques can be used to build a linear regression model. The most frequently used is the ordinary least square (OLS) estimate. The OLS method is used to produce a linear regression line that seeks to minimize the sum of the squared error. The error is the distance from an actual data point to the regression line. The sum of the squared error measures the aggregate of the squared difference between the training instances, which are each of our data points, and the values predicted by the regression line. This can be represented as follows:
In the preceding equation,
is the actual training instance and
is the value predicted by the regression line.
In the context of machine learning, gradient descent is a common technique that can be used to optimize the coefficients of predictor variables by minimizing the training error of the model through multiple iterations. Gradient descent starts by initializing the coefficients to zero. Then, the coefficients are updated with the intention of minimizing the error. Updating the coefficients is an iterative process and is performed until a minimum squared error is achieved.
In the gradient descent technique, a hyperparameter called the learning rate, denoted
by
is provided to the algorithm. This parameter determines how fast the algorithm moves toward the optimal value of the coefficients. If
is very large, the algorithm might skip the optimal solution. If it is too small, however, the algorithm might have too many iterations to converge to the optimum coefficient values. For this reason, it is important to use the right value for
.
In this recipe, we will use the gradient descent method to train our linear regression model.

Getting ready

In Chapter 1, Get Closer To Your Data, we took the HousePrices.csv file and looked at how to manipulate and prepare our data. We also analyzed and treated the missing values in the dataset. We will now use this final dataset for our model-building exercise, using linear regression:
In the following code block, we will start by importing the required libraries:
# import os for operating system dependent functionalities
import os

# import other required libraries
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
We set our working directory with the os.chdir() command:
# Set your working directory according to your requirement
os.chdir(".../Chapter 4/Linear Regression")
os.getcwd()
Let's read our data. We prefix the DataFrame name with df_ so that we can understand it easily:
df_housingdata = pd.read_csv("Final_HousePrices.csv")

How to do it...

Let's move on to building our model. We will start by identifying our numerical and categorical variables. We study the correlations using the correlation matrix and the correlation plots.
  1. First, we'll take a look at the variables and the variable types:
# See the variables and their data types
df_housingdata.dtypes
  1. We'll then look at the correlation matrix. The corr() method computes the pairwise correlation of columns:
# We pass 'pearson' as the method for calculating our correlation
df_housingdata.corr(method='pearson')
  1. Besides this, we'd also like to study the correlation between the predictor variables and the response variable:
  2. ...

Table des matiĂšres

  1. Title Page
  2. Copyright and Credits
  3. About Packt
  4. Foreword
  5. Contributors
  6. Preface
  7. Get Closer to Your Data
  8. Getting Started with Ensemble Machine Learning
  9. Resampling Methods
  10. Statistical and Machine Learning Algorithms
  11. Bag the Models with Bagging
  12. When in Doubt, Use Random Forests
  13. Boosting Model Performance with Boosting
  14. Blend It with Stacking
  15. Homogeneous Ensembles Using Keras
  16. Heterogeneous Ensemble Classifiers Using H2O
  17. Heterogeneous Ensemble for Text Classification Using NLP
  18. Homogenous Ensemble for Multiclass Classification Using Keras
  19. Other Books You May Enjoy
Normes de citation pour Ensemble Machine Learning Cookbook

APA 6 Citation

Sarkar, D., & Natarajan, V. (2019). Ensemble Machine Learning Cookbook (1st ed.). Packt Publishing. Retrieved from https://www.perlego.com/book/876516/ensemble-machine-learning-cookbook-over-35-practical-recipes-to-explore-ensemble-machine-learning-techniques-using-python-pdf (Original work published 2019)

Chicago Citation

Sarkar, Dipayan, and Vijayalakshmi Natarajan. (2019) 2019. Ensemble Machine Learning Cookbook. 1st ed. Packt Publishing. https://www.perlego.com/book/876516/ensemble-machine-learning-cookbook-over-35-practical-recipes-to-explore-ensemble-machine-learning-techniques-using-python-pdf.

Harvard Citation

Sarkar, D. and Natarajan, V. (2019) Ensemble Machine Learning Cookbook. 1st edn. Packt Publishing. Available at: https://www.perlego.com/book/876516/ensemble-machine-learning-cookbook-over-35-practical-recipes-to-explore-ensemble-machine-learning-techniques-using-python-pdf (Accessed: 14 October 2022).

MLA 7 Citation

Sarkar, Dipayan, and Vijayalakshmi Natarajan. Ensemble Machine Learning Cookbook. 1st ed. Packt Publishing, 2019. Web. 14 Oct. 2022.