eBook - ePub

Ensemble Machine Learning Cookbook

Name: Ensemble Machine Learning Cookbook
Author: Dipayan Sarkar, Vijayalakshmi Natarajan

Over 35 practical recipes to explore ensemble machine learning techniques using Python

Dipayan Sarkar, Vijayalakshmi Natarajan

336 pages
English
ePUB (adapté aux mobiles)
Disponible sur iOS et Android

eBook - ePub

Ensemble Machine Learning Cookbook

Over 35 practical recipes to explore ensemble machine learning techniques using Python

Dipayan Sarkar, Vijayalakshmi Natarajan

Détails du livre

Aperçu du livre

Table des matières

Citations

À propos de ce livre

Implement machine learning algorithms to build ensemble models using Keras, H2O, Scikit-Learn, Pandas and more

Key Features

Apply popular machine learning algorithms using a recipe-based approach
Implement boosting, bagging, and stacking ensemble methods to improve machine learning models
Discover real-world ensemble applications and encounter complex challenges in Kaggle competitions

Book Description

Ensemble modeling is an approach used to improve the performance of machine learning models. It combines two or more similar or dissimilar machine learning algorithms to deliver superior intellectual powers. This book will help you to implement popular machine learning algorithms to cover different paradigms of ensemble machine learning such as boosting, bagging, and stacking.

The Ensemble Machine Learning Cookbook will start by getting you acquainted with the basics of ensemble techniques and exploratory data analysis. You'll then learn to implement tasks related to statistical and machine learning algorithms to understand the ensemble of multiple heterogeneous algorithms. It will also ensure that you don't miss out on key topics, such as like resampling methods. As you progress, you'll get a better understanding of bagging, boosting, stacking, and working with the Random Forest algorithm using real-world examples. The book will highlight how these ensemble methods use multiple models to improve machine learning results, as compared to a single model. In the concluding chapters, you'll delve into advanced ensemble models using neural networks, natural language processing, and more. You'll also be able to implement models such as fraud detection, text categorization, and sentiment analysis.

By the end of this book, you'll be able to harness ensemble techniques and the working mechanisms of machine learning algorithms to build intelligent models using individual recipes.

What you will learn

Understand how to use machine learning algorithms for regression and classification problems
Implement ensemble techniques such as averaging, weighted averaging, and max-voting
Get to grips with advanced ensemble methods, such as bootstrapping, bagging, and stacking
Use Random Forest for tasks such as classification and regression
Implement an ensemble of homogeneous and heterogeneous machine learning algorithms
Learn and implement various boosting techniques, such as AdaBoost, Gradient Boosting Machine, and XGBoost

Who this book is for

This book is designed for data scientists, machine learning developers, and deep learning enthusiasts who want to delve into machine learning algorithms to build powerful ensemble models. Working knowledge of Python programming and basic statistics is a must to help you grasp the concepts in the book.

Foire aux questions

Comment puis-je résilier mon abonnement ?

Il vous suffit de vous rendre dans la section compte dans paramètres et de cliquer sur « Résilier l’abonnement ». C’est aussi simple que cela ! Une fois que vous aurez résilié votre abonnement, il restera actif pour le reste de la période pour laquelle vous avez payé. Découvrez-en plus ici.

Puis-je / comment puis-je télécharger des livres ?

Pour le moment, tous nos livres en format ePub adaptés aux mobiles peuvent être téléchargés via l’application. La plupart de nos PDF sont également disponibles en téléchargement et les autres seront téléchargeables très prochainement. Découvrez-en plus ici.

Quelle est la différence entre les formules tarifaires ?

Les deux abonnements vous donnent un accès complet à la bibliothèque et à toutes les fonctionnalités de Perlego. Les seules différences sont les tarifs ainsi que la période d’abonnement : avec l’abonnement annuel, vous économiserez environ 30 % par rapport à 12 mois d’abonnement mensuel.

Qu’est-ce que Perlego ?

Nous sommes un service d’abonnement à des ouvrages universitaires en ligne, où vous pouvez accéder à toute une bibliothèque pour un prix inférieur à celui d’un seul livre par mois. Avec plus d’un million de livres sur plus de 1 000 sujets, nous avons ce qu’il vous faut ! Découvrez-en plus ici.

Prenez-vous en charge la synthèse vocale ?

Recherchez le symbole Écouter sur votre prochain livre pour voir si vous pouvez l’écouter. L’outil Écouter lit le texte à haute voix pour vous, en surlignant le passage qui est en cours de lecture. Vous pouvez le mettre sur pause, l’accélérer ou le ralentir. Découvrez-en plus ici.

Est-ce que Ensemble Machine Learning Cookbook est un PDF/ePUB en ligne ?

Oui, vous pouvez accéder à Ensemble Machine Learning Cookbook par Dipayan Sarkar, Vijayalakshmi Natarajan en format PDF et/ou ePUB ainsi qu’à d’autres livres populaires dans Ciencia de la computación et Inteligencia artificial (IA) y semántica. Nous disposons de plus d’un million d’ouvrages à découvrir dans notre catalogue.

Informations

Éditeur

Packt Publishing

Année

2019

ISBN

9781789132502

Édition

Sujet

Ciencia de la computación

Sous-sujet

Inteligencia artificial (IA) y semántica

Statistical and Machine Learning Algorithms

In this chapter, we will cover the following recipes:

Multiple linear regression
Logistic regression
Naive Bayes
Decision trees
Support vector machines

Technical requirements

The technical requirements for this chapter remain the same as those we detailed in Chapter 1, Get Closer to Your Data.

Visit the GitHub repository to get the dataset and the code. These are arranged by chapter and by the name of the topic. For the linear regression dataset and code, for example, visit .../Chapter 3/Linear regression.

Multiple linear regression

Multiple linear regression is a technique used to train a linear model, that assumes that there are linear relationships between multiple predictor variables (

) and a continuous target variable (

). The general equation for a multiple linear regression with m predictor variables is as follows:

Training a linear regression model involves estimating the values of the coefficients for each of the predictor variables denoted by the letter

. In the preceding equation,

denotes an error term, which is normally distributed, and has zero mean and constant variance. This is represented as follows:

Various techniques can be used to build a linear regression model. The most frequently used is the ordinary least square (OLS) estimate. The OLS method is used to produce a linear regression line that seeks to minimize the sum of the squared error. The error is the distance from an actual data point to the regression line. The sum of the squared error measures the aggregate of the squared difference between the training instances, which are each of our data points, and the values predicted by the regression line. This can be represented as follows:

In the preceding equation,

is the actual training instance and

is the value predicted by the regression line.

In the context of machine learning, gradient descent is a common technique that can be used to optimize the coefficients of predictor variables by minimizing the training error of the model through multiple iterations. Gradient descent starts by initializing the coefficients to zero. Then, the coefficients are updated with the intention of minimizing the error. Updating the coefficients is an iterative process and is performed until a minimum squared error is achieved.

In the gradient descent technique, a hyperparameter called the learning rate, denoted
by

is provided to the algorithm. This parameter determines how fast the algorithm moves toward the optimal value of the coefficients. If

is very large, the algorithm might skip the optimal solution. If it is too small, however, the algorithm might have too many iterations to converge to the optimum coefficient values. For this reason, it is important to use the right value for

In this recipe, we will use the gradient descent method to train our linear regression model.

Getting ready

In Chapter 1, Get Closer To Your Data, we took the HousePrices.csv file and looked at how to manipulate and prepare our data. We also analyzed and treated the missing values in the dataset. We will now use this final dataset for our model-building exercise, using linear regression:

In the following code block, we will start by importing the required libraries:

# import os for operating system dependent functionalities
import os

# import other required libraries
import pandas as pd
import numpy as np
import seaborn as sns 
import matplotlib.pyplot as plt

We set our working directory with the os.chdir() command:

# Set your working directory according to your requirement
os.chdir(".../Chapter 4/Linear Regression")
os.getcwd()

Let's read our data. We prefix the DataFrame name with df_ so that we can understand it easily:

df_housingdata = pd.read_csv("Final_HousePrices.csv")

How to do it...

Let's move on to building our model. We will start by identifying our numerical and categorical variables. We study the correlations using the correlation matrix and the correlation plots.

First, we'll take a look at the variables and the variable types:

# See the variables and their data types
df_housingdata.dtypes

We'll then look at the correlation matrix. The corr() method computes the pairwise correlation of columns:

# We pass 'pearson' as the method for calculating our correlation
df_housingdata.corr(method='pearson')

Besides this, we'd also like to study the correlation between the predictor variables and the response variable:

Table des matières

Title Page
Copyright and Credits
About Packt
Foreword
Contributors
Preface
Get Closer to Your Data
Getting Started with Ensemble Machine Learning
Resampling Methods
Statistical and Machine Learning Algorithms
Bag the Models with Bagging
When in Doubt, Use Random Forests
Boosting Model Performance with Boosting
Blend It with Stacking
Homogeneous Ensembles Using Keras
Heterogeneous Ensemble Classifiers Using H2O
Heterogeneous Ensemble for Text Classification Using NLP
Homogenous Ensemble for Multiclass Classification Using Keras
Other Books You May Enjoy

Normes de citation pour Ensemble Machine Learning Cookbook

APA 6 Citation

Sarkar, D., & Natarajan, V. (2019). Ensemble Machine Learning Cookbook (1st ed.). Packt Publishing. Retrieved from https://www.perlego.com/book/876516/ensemble-machine-learning-cookbook-over-35-practical-recipes-to-explore-ensemble-machine-learning-techniques-using-python-pdf (Original work published 2019)

Chicago Citation

Sarkar, Dipayan, and Vijayalakshmi Natarajan. (2019) 2019. Ensemble Machine Learning Cookbook. 1st ed. Packt Publishing. https://www.perlego.com/book/876516/ensemble-machine-learning-cookbook-over-35-practical-recipes-to-explore-ensemble-machine-learning-techniques-using-python-pdf.

Harvard Citation

Sarkar, D. and Natarajan, V. (2019) Ensemble Machine Learning Cookbook. 1st edn. Packt Publishing. Available at: https://www.perlego.com/book/876516/ensemble-machine-learning-cookbook-over-35-practical-recipes-to-explore-ensemble-machine-learning-techniques-using-python-pdf (Accessed: 14 October 2022).

MLA 7 Citation

Sarkar, Dipayan, and Vijayalakshmi Natarajan. Ensemble Machine Learning Cookbook. 1st ed. Packt Publishing, 2019. Web. 14 Oct. 2022.