eBook - ePub

Ensemble Learning

Name: Ensemble Learning
Author: Lior Rokach

Pattern Classification Using Ensemble Methods

Lior Rokach,

300 pages
English
ePUB (mobile friendly)
Available on iOS & Android

eBook - ePub

Ensemble Learning

Pattern Classification Using Ensemble Methods

Lior Rokach,

Book details

Book preview

Table of contents

Citations

About This Book

This updated compendium provides a methodical introduction with a coherent and unified repository of ensemble methods, theories, trends, challenges, and applications. More than a third of this edition comprised of new materials, highlighting descriptions of the classic methods, and extensions and novel approaches that have recently been introduced.

Along with algorithmic descriptions of each method, the settings in which each method is applicable and the consequences and tradeoffs incurred by using the method is succinctly featured. R code for implementation of the algorithm is also emphasized.

The unique volume provides researchers, students and practitioners in industry with a comprehensive, concise and convenient resource on ensemble learning methods.

Contents:

Introduction to Machine Learning
Classification and Regression Trees
Introduction to Ensemble Learning
Ensemble Classification
Gradient Boosting Machines
Ensemble Diversity
Ensemble Selection
Error Correcting Output Codes
Evaluating Ensembles of Classifiers

Readership: Professionals, researchers, academics, and graduate students in artificial intelligence, databases and machine learning.Ensemble Learning;Random Forest;Decision Tree;Machine Learning;Data Science;Big Data;Gradient Boosting Machine00

Frequently asked questions

Simply head over to the account section in settings and click on “Cancel Subscription” - it’s as simple as that. After you cancel, your membership will stay active for the remainder of the time you’ve paid for. Learn more here.

At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.

Both plans give you full access to the library and all of Perlego’s features. The only differences are the price and subscription period: With the annual plan you’ll save around 30% compared to 12 months on the monthly plan.

We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.

Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.

Yes, you can access Ensemble Learning by Lior Rokach in PDF and/or ePUB format, as well as other popular books in Informatik & Computer Vision & Mustererkennung. We have over one million books available in our catalogue for you to explore.

Information

Publisher

WSPC

ISBN

9789811201974

Edition

Topic

Informatik

Subtopic

Computer Vision & Mustererkennung

Chapter 1 Introduction to Machine Learning

Artificial intelligence (AI) is a scientific discipline that aims to create intelligent machines. Machine learning is a popular and practical AI subfield that aims to automatically improve the performance of computer programs through experience. In particular, machine learning enables computers to perform various tasks by learning from past experience rather than being explicitly programmed.

Various domains such as commerce, biology, medicine, engineering, and cyber-security apply machine learning in order to gain new insights regarding the task in question. Google search engine is an excellent example of a service that applies machine learning on a regular basis. It is well-known that Google tracks users clicks in an attempt to improve the relevance of its search engine results and its advertising capabilities. Moreover, users queries can be analyzed over time for shedding light on the publics interests. Specifically, the Google Trends service enables anyone to view search trends for a topic across regions of the world, including comparative trends of two or more topics. Entrepreneurs use this service to identify new business opportunities and economists are using it to predict market movements [Ball (2013)]. In particular, this service can help in epidemiological studies by aggregating certain search terms that are found to be good indicators of an investigated disease. For example, Ginsberg et al. (2008) used search engine query data to detect influenza epidemics. They observed that a pattern forms when all the flu-related phrases are accumulated. An analysis of these various searches reveals that many search terms associated with flu tend to be popular exactly when its flu season.

One of the main goals of machine learning is to be able to make accurate predictions about specific phenomena. However, prediction is not an easy task. As the famous quote says, “It is difficult to make predictions, especially about the future” (attributed to Mark Twain and others). Yet, we rely on prediction all the time, and it guides our behavior and the decisions and choices we make. For example, the popular YouTube website (also owned by Google) analyzes our viewing habits in order to predict other videos we might like. Based on this prediction, YouTube presents us with personalized recommendations which are largely on target. In order to get a rough idea of YouTubes capabilities in this regard you could simply ask yourself how often watching a video on YouTube leads you to watch other similar videos that were recommended to you by the service? Similarly, online social networks (OSNs), such as Facebook and LinkedIn, make predictions in order to automatically suggest friends and acquaintances that we might want to connect with.

Most machine learning techniques are based on inductive learning [Mitchell (1997)], where a model is constructed explicitly or implicitly by generalizing from a sufficient number of training examples. The underlying assumption of the inductive approach is that the trained model is applicable to future unseen examples. In other words, any machine learning method based on inductive learning relies on the training set and assumes that future data shares its characteristics.

Strictly speaking, any form of inference in which the conclusions are not deductively implied by the premises can be thought of as an induction. More formally Mitchell defines machine learning as a computer program that improves at task (T) with respect to performance measure (P) based on experience (E).

Traditionally, data collection was regarded as one of the most important stages in data analysis. An analyst (e.g., a statistician or data scientist) would use the available domain knowledge to select the variables to be collected. The number of variables selected was usually limited, and the collection of their values could be done manually (e.g., utilizing handwritten records or oral interviews). In the case of computer-aided analysis, the analyst had to enter the collected data into a statistical computer package or an electronic spreadsheet. Due to the high cost of data collection, people learned to make decisions based on limited information.

Since the dawn of the big data age, accumulating and storing data has become easier and inexpensive. It has been estimated that the amount of stored information doubles every twenty months [Frawley et al. (1991)]. Unfortunately, as the amount of machine-readable information increases, the ability to understand and make use of it does not keep pace with its growth.

1.1 Supervised Learning

1.1.1 Overview

In the machine learning community, prediction methods are commonly referred to as supervised learning. In contrast, unsupervised learning refers to modeling the distribution of instances in a typical, high-dimensional input space. According to [Kohavi and Provost (1998)], the term “unsupervised learning” refers to “learning techniques that group instances without a prespecified dependent attribute.” .

Supervised methods are methods that attempt to discover the relationship between input attributes (sometimes called independent variables) and a target attribute (sometimes referred to as a dependent variable). The relationship that is discovered is represented in a structure referred to as a Model. Usually, models describe and explain phenomena which are hidden in the dataset, and they can be used for predicting the value of the target attribute whenever the values of the input attributes are known.

Supervised learning methods can be implemented in a variety of domains such as marketing, finance, and manufacturing. It is useful to distinguish between two main supervised learning models: Classification Models (Classifiers) and Regression Models. Regression models map the input space into a real-valued domain. For instance, a regression model can predict the demand for a certain product given its characteristics. Classifiers map the input space into predefined classes. For example, classifiers can be used to classify mortgage consumers as good and bad. There are many alternatives for representing classifiers, including support vector machines, decision trees, probabilistic summaries, algebraic function, etc.

1.1.2 The Classification Task

This book deals mainly with classification problems. Along with regression and probability estimation, classification is one of the most studied supervised learning tasks, and possibly the one with the greatest practical relevance to real-world applications.

For example, we may wish to classify flowers from the Iris genus into their subgeni (such as Iris Setosa, Iris Versicolour, and Iris Virginica). In this case, the input vector will consist of the flowers features, such as the length and width of the sepal and petal. The label of each instance will be one of the strings Iris Setosa, Iris Versicolour and Iris Virginica, or alternatively, the labels can take a value from 1,2,3, a,b,c or any other set of three distinct values.

Another common example of classification task is optical character recognition (OCR). These applications convert scanned documents into machine-readable text in order to simplify their storage and retrieval. Each document undergoes three steps. First, an operator scans the document. This converts the document into a bitmap image. Next, the scanned document is segmented such that each character is isolated from the others. Then, a feature extractor measures certain features of each character such as open areas, closed shapes, diagonal lines, and line intersections. Finally, the scanned characters are associated with their corresponding alphanumeric character. The association is made by applying a machine learning algorithm to the features of the scanned characters. In this case, the set of labels/categories/classes are the set of all letters, numbers, punctuation marks, etc.

In order to better understand the notion of classification, consider the problem of email spam. We all suffer from email spam in which spammers exploit electronic mail systems to send unsolicited bulk messages. A spam message is any message that the user does not want to receive and did not ask to receive. In order to address the problem of spam email, machine learning is used to train a classification model for detecting spam email messages.

The spam detection model can be trained using past emails that were manually labeled as either spam or not spam. Each training example is usually represented as a set of attributes or features that characterize it. In the case of spam detection, the words in the email’s content can be used as the features. The machine learning algorithm aims to generalize from these examples and automatically classify future emails. In particular, the model, which has been trained by analyzing the text of spam emails, looks for incriminating content in a users new unlabeled emails. The more training data the system has, the more accurate it becomes.

Recall that machine learning aims to improve performance for some task with experience. Note that we have the following three components:

(1) Task T that we would like to improve with learning

(2) Experience E to be used for learning

(3) Performance measure P that is used to measure the improvement

In the spam detection case, the task T is to identify spam emails. The performance measures P are the ratio of spam emails that were correctly filtered and the ratio of non-spam emails that were incorrectly filtered out. Finally, the experience E is a database of emails that were received by the user. All emails that were reported by the user as spam are considered as positive examples, while the remaining emails are assumed to be non-spam emails (negative examples).

In order to automatically filter spam messages we need to train a classification model. Obviously, data is very crucial for training the classifier or, as Prof. Deming puts it: “In God we trust; all others must bring data.”

The data that is used to train the model is called the “training set.” In the spam filtering example, a training set in th...

Cover page
Title page
Copyright
Dedication
Preface
Contents
1. Introduction to Machine Learning
2. Classification and Regression Trees
3. Introduction to Ensemble Learning
4. Ensemble Classification
5. Gradient Boosting Machines
6. Ensemble Diversity
7. Ensemble Selection
8. Error Correcting Output Codes
9. Evaluating Ensembles of Classifiers
Bibliography
Index