eBook - ePub

Ensemble Machine Learning Cookbook

Name: Ensemble Machine Learning Cookbook
Author: Dipayan Sarkar, Vijayalakshmi Natarajan

Over 35 practical recipes to explore ensemble machine learning techniques using Python

Dipayan Sarkar, Vijayalakshmi Natarajan

336 pagine
English
ePUB (disponibile sull'app)
Disponibile su iOS e Android

eBook - ePub

Ensemble Machine Learning Cookbook

Over 35 practical recipes to explore ensemble machine learning techniques using Python

Dipayan Sarkar, Vijayalakshmi Natarajan

Dettagli del libro

Anteprima del libro

Indice dei contenuti

Citazioni

Informazioni sul libro

Implement machine learning algorithms to build ensemble models using Keras, H2O, Scikit-Learn, Pandas and more

Key Features

Apply popular machine learning algorithms using a recipe-based approach
Implement boosting, bagging, and stacking ensemble methods to improve machine learning models
Discover real-world ensemble applications and encounter complex challenges in Kaggle competitions

Book Description

Ensemble modeling is an approach used to improve the performance of machine learning models. It combines two or more similar or dissimilar machine learning algorithms to deliver superior intellectual powers. This book will help you to implement popular machine learning algorithms to cover different paradigms of ensemble machine learning such as boosting, bagging, and stacking.

The Ensemble Machine Learning Cookbook will start by getting you acquainted with the basics of ensemble techniques and exploratory data analysis. You'll then learn to implement tasks related to statistical and machine learning algorithms to understand the ensemble of multiple heterogeneous algorithms. It will also ensure that you don't miss out on key topics, such as like resampling methods. As you progress, you'll get a better understanding of bagging, boosting, stacking, and working with the Random Forest algorithm using real-world examples. The book will highlight how these ensemble methods use multiple models to improve machine learning results, as compared to a single model. In the concluding chapters, you'll delve into advanced ensemble models using neural networks, natural language processing, and more. You'll also be able to implement models such as fraud detection, text categorization, and sentiment analysis.

By the end of this book, you'll be able to harness ensemble techniques and the working mechanisms of machine learning algorithms to build intelligent models using individual recipes.

What you will learn

Understand how to use machine learning algorithms for regression and classification problems
Implement ensemble techniques such as averaging, weighted averaging, and max-voting
Get to grips with advanced ensemble methods, such as bootstrapping, bagging, and stacking
Use Random Forest for tasks such as classification and regression
Implement an ensemble of homogeneous and heterogeneous machine learning algorithms
Learn and implement various boosting techniques, such as AdaBoost, Gradient Boosting Machine, and XGBoost

Who this book is for

This book is designed for data scientists, machine learning developers, and deep learning enthusiasts who want to delve into machine learning algorithms to build powerful ensemble models. Working knowledge of Python programming and basic statistics is a must to help you grasp the concepts in the book.

Domande frequenti

Come faccio ad annullare l'abbonamento?

È semplicissimo: basta accedere alla sezione Account nelle Impostazioni e cliccare su "Annulla abbonamento". Dopo la cancellazione, l'abbonamento rimarrà attivo per il periodo rimanente già pagato. Per maggiori informazioni, clicca qui

È possibile scaricare libri? Se sì, come?

Al momento è possibile scaricare tramite l'app tutti i nostri libri ePub mobile-friendly. Anche la maggior parte dei nostri PDF è scaricabile e stiamo lavorando per rendere disponibile quanto prima il download di tutti gli altri file. Per maggiori informazioni, clicca qui

Che differenza c'è tra i piani?

Entrambi i piani ti danno accesso illimitato alla libreria e a tutte le funzionalità di Perlego. Le uniche differenze sono il prezzo e il periodo di abbonamento: con il piano annuale risparmierai circa il 30% rispetto a 12 rate con quello mensile.

Cos'è Perlego?

Perlego è un servizio di abbonamento a testi accademici, che ti permette di accedere a un'intera libreria online a un prezzo inferiore rispetto a quello che pagheresti per acquistare un singolo libro al mese. Con oltre 1 milione di testi suddivisi in più di 1.000 categorie, troverai sicuramente ciò che fa per te! Per maggiori informazioni, clicca qui.

Perlego supporta la sintesi vocale?

Cerca l'icona Sintesi vocale nel prossimo libro che leggerai per verificare se è possibile riprodurre l'audio. Questo strumento permette di leggere il testo a voce alta, evidenziandolo man mano che la lettura procede. Puoi aumentare o diminuire la velocità della sintesi vocale, oppure sospendere la riproduzione. Per maggiori informazioni, clicca qui.

Ensemble Machine Learning Cookbook è disponibile online in formato PDF/ePub?

Sì, puoi accedere a Ensemble Machine Learning Cookbook di Dipayan Sarkar, Vijayalakshmi Natarajan in formato PDF e/o ePub, così come ad altri libri molto apprezzati nelle sezioni relative a Informatica e Intelligenza artificiale (IA) e semantica. Scopri oltre 1 milione di libri disponibili nel nostro catalogo.

Informazioni

Editore

Packt Publishing

Anno

2019

ISBN

9781789132502

Edizione

Argomento

Informatica

Categoria

Intelligenza artificiale (IA) e semantica

Statistical and Machine Learning Algorithms

In this chapter, we will cover the following recipes:

Multiple linear regression
Logistic regression
Naive Bayes
Decision trees
Support vector machines

Technical requirements

The technical requirements for this chapter remain the same as those we detailed in Chapter 1, Get Closer to Your Data.

Visit the GitHub repository to get the dataset and the code. These are arranged by chapter and by the name of the topic. For the linear regression dataset and code, for example, visit .../Chapter 3/Linear regression.

Multiple linear regression

Multiple linear regression is a technique used to train a linear model, that assumes that there are linear relationships between multiple predictor variables (

) and a continuous target variable (

). The general equation for a multiple linear regression with m predictor variables is as follows:

Training a linear regression model involves estimating the values of the coefficients for each of the predictor variables denoted by the letter

. In the preceding equation,

denotes an error term, which is normally distributed, and has zero mean and constant variance. This is represented as follows:

Various techniques can be used to build a linear regression model. The most frequently used is the ordinary least square (OLS) estimate. The OLS method is used to produce a linear regression line that seeks to minimize the sum of the squared error. The error is the distance from an actual data point to the regression line. The sum of the squared error measures the aggregate of the squared difference between the training instances, which are each of our data points, and the values predicted by the regression line. This can be represented as follows:

In the preceding equation,

is the actual training instance and

is the value predicted by the regression line.

In the context of machine learning, gradient descent is a common technique that can be used to optimize the coefficients of predictor variables by minimizing the training error of the model through multiple iterations. Gradient descent starts by initializing the coefficients to zero. Then, the coefficients are updated with the intention of minimizing the error. Updating the coefficients is an iterative process and is performed until a minimum squared error is achieved.

In the gradient descent technique, a hyperparameter called the learning rate, denoted
by

is provided to the algorithm. This parameter determines how fast the algorithm moves toward the optimal value of the coefficients. If

is very large, the algorithm might skip the optimal solution. If it is too small, however, the algorithm might have too many iterations to converge to the optimum coefficient values. For this reason, it is important to use the right value for

In this recipe, we will use the gradient descent method to train our linear regression model.

Getting ready

In Chapter 1, Get Closer To Your Data, we took the HousePrices.csv file and looked at how to manipulate and prepare our data. We also analyzed and treated the missing values in the dataset. We will now use this final dataset for our model-building exercise, using linear regression:

In the following code block, we will start by importing the required libraries:

# import os for operating system dependent functionalities
import os

# import other required libraries
import pandas as pd
import numpy as np
import seaborn as sns 
import matplotlib.pyplot as plt

We set our working directory with the os.chdir() command:

# Set your working directory according to your requirement
os.chdir(".../Chapter 4/Linear Regression")
os.getcwd()

Let's read our data. We prefix the DataFrame name with df_ so that we can understand it easily:

df_housingdata = pd.read_csv("Final_HousePrices.csv")

How to do it...

Let's move on to building our model. We will start by identifying our numerical and categorical variables. We study the correlations using the correlation matrix and the correlation plots.

First, we'll take a look at the variables and the variable types:

# See the variables and their data types
df_housingdata.dtypes

We'll then look at the correlation matrix. The corr() method computes the pairwise correlation of columns:

# We pass 'pearson' as the method for calculating our correlation
df_housingdata.corr(method='pearson')

Besides this, we'd also like to study the correlation between the predictor variables and the response variable:

Indice dei contenuti

Title Page
Copyright and Credits
About Packt
Foreword
Contributors
Preface
Get Closer to Your Data
Getting Started with Ensemble Machine Learning
Resampling Methods
Statistical and Machine Learning Algorithms
Bag the Models with Bagging
When in Doubt, Use Random Forests
Boosting Model Performance with Boosting
Blend It with Stacking
Homogeneous Ensembles Using Keras
Heterogeneous Ensemble Classifiers Using H2O
Heterogeneous Ensemble for Text Classification Using NLP
Homogenous Ensemble for Multiclass Classification Using Keras
Other Books You May Enjoy

Stili delle citazioni per Ensemble Machine Learning Cookbook

APA 6 Citation

Sarkar, D., & Natarajan, V. (2019). Ensemble Machine Learning Cookbook (1st ed.). Packt Publishing. Retrieved from https://www.perlego.com/book/876516/ensemble-machine-learning-cookbook-over-35-practical-recipes-to-explore-ensemble-machine-learning-techniques-using-python-pdf (Original work published 2019)

Chicago Citation

Sarkar, Dipayan, and Vijayalakshmi Natarajan. (2019) 2019. Ensemble Machine Learning Cookbook. 1st ed. Packt Publishing. https://www.perlego.com/book/876516/ensemble-machine-learning-cookbook-over-35-practical-recipes-to-explore-ensemble-machine-learning-techniques-using-python-pdf.

Harvard Citation

Sarkar, D. and Natarajan, V. (2019) Ensemble Machine Learning Cookbook. 1st edn. Packt Publishing. Available at: https://www.perlego.com/book/876516/ensemble-machine-learning-cookbook-over-35-practical-recipes-to-explore-ensemble-machine-learning-techniques-using-python-pdf (Accessed: 14 October 2022).

MLA 7 Citation

Sarkar, Dipayan, and Vijayalakshmi Natarajan. Ensemble Machine Learning Cookbook. 1st ed. Packt Publishing, 2019. Web. 14 Oct. 2022.