eBook - ePub

Ensemble Machine Learning Cookbook

Name: Ensemble Machine Learning Cookbook
Author: Dipayan Sarkar, Vijayalakshmi Natarajan

Over 35 practical recipes to explore ensemble machine learning techniques using Python

Dipayan Sarkar, Vijayalakshmi Natarajan

336 páginas
English
ePUB (apto para móviles)
Disponible en iOS y Android

eBook - ePub

Ensemble Machine Learning Cookbook

Over 35 practical recipes to explore ensemble machine learning techniques using Python

Dipayan Sarkar, Vijayalakshmi Natarajan

Detalles del libro

Vista previa del libro

Índice

Citas

Información del libro

Implement machine learning algorithms to build ensemble models using Keras, H2O, Scikit-Learn, Pandas and more

Key Features

Apply popular machine learning algorithms using a recipe-based approach
Implement boosting, bagging, and stacking ensemble methods to improve machine learning models
Discover real-world ensemble applications and encounter complex challenges in Kaggle competitions

Book Description

Ensemble modeling is an approach used to improve the performance of machine learning models. It combines two or more similar or dissimilar machine learning algorithms to deliver superior intellectual powers. This book will help you to implement popular machine learning algorithms to cover different paradigms of ensemble machine learning such as boosting, bagging, and stacking.

The Ensemble Machine Learning Cookbook will start by getting you acquainted with the basics of ensemble techniques and exploratory data analysis. You'll then learn to implement tasks related to statistical and machine learning algorithms to understand the ensemble of multiple heterogeneous algorithms. It will also ensure that you don't miss out on key topics, such as like resampling methods. As you progress, you'll get a better understanding of bagging, boosting, stacking, and working with the Random Forest algorithm using real-world examples. The book will highlight how these ensemble methods use multiple models to improve machine learning results, as compared to a single model. In the concluding chapters, you'll delve into advanced ensemble models using neural networks, natural language processing, and more. You'll also be able to implement models such as fraud detection, text categorization, and sentiment analysis.

By the end of this book, you'll be able to harness ensemble techniques and the working mechanisms of machine learning algorithms to build intelligent models using individual recipes.

What you will learn

Understand how to use machine learning algorithms for regression and classification problems
Implement ensemble techniques such as averaging, weighted averaging, and max-voting
Get to grips with advanced ensemble methods, such as bootstrapping, bagging, and stacking
Use Random Forest for tasks such as classification and regression
Implement an ensemble of homogeneous and heterogeneous machine learning algorithms
Learn and implement various boosting techniques, such as AdaBoost, Gradient Boosting Machine, and XGBoost

Who this book is for

This book is designed for data scientists, machine learning developers, and deep learning enthusiasts who want to delve into machine learning algorithms to build powerful ensemble models. Working knowledge of Python programming and basic statistics is a must to help you grasp the concepts in the book.

Preguntas frecuentes

¿Cómo cancelo mi suscripción?

Simplemente, dirígete a la sección ajustes de la cuenta y haz clic en «Cancelar suscripción». Así de sencillo. Después de cancelar tu suscripción, esta permanecerá activa el tiempo restante que hayas pagado. Obtén más información aquí.

¿Cómo descargo los libros?

Por el momento, todos nuestros libros ePub adaptables a dispositivos móviles se pueden descargar a través de la aplicación. La mayor parte de nuestros PDF también se puede descargar y ya estamos trabajando para que el resto también sea descargable. Obtén más información aquí.

¿En qué se diferencian los planes de precios?

Ambos planes te permiten acceder por completo a la biblioteca y a todas las funciones de Perlego. Las únicas diferencias son el precio y el período de suscripción: con el plan anual ahorrarás en torno a un 30 % en comparación con 12 meses de un plan mensual.

¿Qué es Perlego?

Somos un servicio de suscripción de libros de texto en línea que te permite acceder a toda una biblioteca en línea por menos de lo que cuesta un libro al mes. Con más de un millón de libros sobre más de 1000 categorías, ¡tenemos todo lo que necesitas! Obtén más información aquí.

¿Perlego ofrece la función de texto a voz?

Busca el símbolo de lectura en voz alta en tu próximo libro para ver si puedes escucharlo. La herramienta de lectura en voz alta lee el texto en voz alta por ti, resaltando el texto a medida que se lee. Puedes pausarla, acelerarla y ralentizarla. Obtén más información aquí.

¿Es Ensemble Machine Learning Cookbook un PDF/ePUB en línea?

Sí, puedes acceder a Ensemble Machine Learning Cookbook de Dipayan Sarkar, Vijayalakshmi Natarajan en formato PDF o ePUB, así como a otros libros populares de Ciencia de la computación y Inteligencia artificial (IA) y semántica. Tenemos más de un millón de libros disponibles en nuestro catálogo para que explores.

Información

Editorial

Packt Publishing

Año

2019

ISBN

9781789132502

Edición

Categoría

Ciencia de la computación

Categoría

Inteligencia artificial (IA) y semántica

Statistical and Machine Learning Algorithms

In this chapter, we will cover the following recipes:

Multiple linear regression
Logistic regression
Naive Bayes
Decision trees
Support vector machines

Technical requirements

The technical requirements for this chapter remain the same as those we detailed in Chapter 1, Get Closer to Your Data.

Visit the GitHub repository to get the dataset and the code. These are arranged by chapter and by the name of the topic. For the linear regression dataset and code, for example, visit .../Chapter 3/Linear regression.

Multiple linear regression

Multiple linear regression is a technique used to train a linear model, that assumes that there are linear relationships between multiple predictor variables (

) and a continuous target variable (

). The general equation for a multiple linear regression with m predictor variables is as follows:

Training a linear regression model involves estimating the values of the coefficients for each of the predictor variables denoted by the letter

. In the preceding equation,

denotes an error term, which is normally distributed, and has zero mean and constant variance. This is represented as follows:

Various techniques can be used to build a linear regression model. The most frequently used is the ordinary least square (OLS) estimate. The OLS method is used to produce a linear regression line that seeks to minimize the sum of the squared error. The error is the distance from an actual data point to the regression line. The sum of the squared error measures the aggregate of the squared difference between the training instances, which are each of our data points, and the values predicted by the regression line. This can be represented as follows:

In the preceding equation,

is the actual training instance and

is the value predicted by the regression line.

In the context of machine learning, gradient descent is a common technique that can be used to optimize the coefficients of predictor variables by minimizing the training error of the model through multiple iterations. Gradient descent starts by initializing the coefficients to zero. Then, the coefficients are updated with the intention of minimizing the error. Updating the coefficients is an iterative process and is performed until a minimum squared error is achieved.

In the gradient descent technique, a hyperparameter called the learning rate, denoted
by

is provided to the algorithm. This parameter determines how fast the algorithm moves toward the optimal value of the coefficients. If

is very large, the algorithm might skip the optimal solution. If it is too small, however, the algorithm might have too many iterations to converge to the optimum coefficient values. For this reason, it is important to use the right value for

In this recipe, we will use the gradient descent method to train our linear regression model.

Getting ready

In Chapter 1, Get Closer To Your Data, we took the HousePrices.csv file and looked at how to manipulate and prepare our data. We also analyzed and treated the missing values in the dataset. We will now use this final dataset for our model-building exercise, using linear regression:

In the following code block, we will start by importing the required libraries:

# import os for operating system dependent functionalities
import os

# import other required libraries
import pandas as pd
import numpy as np
import seaborn as sns 
import matplotlib.pyplot as plt

We set our working directory with the os.chdir() command:

# Set your working directory according to your requirement
os.chdir(".../Chapter 4/Linear Regression")
os.getcwd()

Let's read our data. We prefix the DataFrame name with df_ so that we can understand it easily:

df_housingdata = pd.read_csv("Final_HousePrices.csv")

How to do it...

Let's move on to building our model. We will start by identifying our numerical and categorical variables. We study the correlations using the correlation matrix and the correlation plots.

First, we'll take a look at the variables and the variable types:

# See the variables and their data types
df_housingdata.dtypes

We'll then look at the correlation matrix. The corr() method computes the pairwise correlation of columns:

# We pass 'pearson' as the method for calculating our correlation
df_housingdata.corr(method='pearson')

Besides this, we'd also like to study the correlation between the predictor variables and the response variable:

Índice

Title Page
Copyright and Credits
About Packt
Foreword
Contributors
Preface
Get Closer to Your Data
Getting Started with Ensemble Machine Learning
Resampling Methods
Statistical and Machine Learning Algorithms
Bag the Models with Bagging
When in Doubt, Use Random Forests
Boosting Model Performance with Boosting
Blend It with Stacking
Homogeneous Ensembles Using Keras
Heterogeneous Ensemble Classifiers Using H2O
Heterogeneous Ensemble for Text Classification Using NLP
Homogenous Ensemble for Multiclass Classification Using Keras
Other Books You May Enjoy

Estilos de citas para Ensemble Machine Learning Cookbook

APA 6 Citation

Sarkar, D., & Natarajan, V. (2019). Ensemble Machine Learning Cookbook (1st ed.). Packt Publishing. Retrieved from https://www.perlego.com/book/876516/ensemble-machine-learning-cookbook-over-35-practical-recipes-to-explore-ensemble-machine-learning-techniques-using-python-pdf (Original work published 2019)

Chicago Citation

Sarkar, Dipayan, and Vijayalakshmi Natarajan. (2019) 2019. Ensemble Machine Learning Cookbook. 1st ed. Packt Publishing. https://www.perlego.com/book/876516/ensemble-machine-learning-cookbook-over-35-practical-recipes-to-explore-ensemble-machine-learning-techniques-using-python-pdf.

Harvard Citation

Sarkar, D. and Natarajan, V. (2019) Ensemble Machine Learning Cookbook. 1st edn. Packt Publishing. Available at: https://www.perlego.com/book/876516/ensemble-machine-learning-cookbook-over-35-practical-recipes-to-explore-ensemble-machine-learning-techniques-using-python-pdf (Accessed: 14 October 2022).

MLA 7 Citation

Sarkar, Dipayan, and Vijayalakshmi Natarajan. Ensemble Machine Learning Cookbook. 1st ed. Packt Publishing, 2019. Web. 14 Oct. 2022.