eBook - ePub

Applied Unsupervised Learning with R

Name: Applied Unsupervised Learning with R
Author: Alok Malik, Bradford Tuckfield

Uncover hidden relationships and patterns with k-means clustering, hierarchical clustering, and PCA

Alok Malik, Bradford Tuckfield

320 pages
English
ePUB (adapté aux mobiles)
Disponible sur iOS et Android

eBook - ePub

Applied Unsupervised Learning with R

Uncover hidden relationships and patterns with k-means clustering, hierarchical clustering, and PCA

Alok Malik, Bradford Tuckfield

Détails du livre

Aperçu du livre

Table des matières

Citations

À propos de ce livre

Design clever algorithms that discover hidden patterns and draw responses from unstructured, unlabeled data.

Key Features

Build state-of-the-art algorithms that can solve your business' problems
Learn how to find hidden patterns in your data
Revise key concepts with hands-on exercises using real-world datasets

Book Description

Starting with the basics, Applied Unsupervised Learning with R explains clustering methods, distribution analysis, data encoders, and features of R that enable you to understand your data better and get answers to your most pressing business questions.

This book begins with the most important and commonly used method for unsupervised learning - clustering - and explains the three main clustering algorithms - k-means, divisive, and agglomerative. Following this, you'll study market basket analysis, kernel density estimation, principal component analysis, and anomaly detection. You'll be introduced to these methods using code written in R, with further instructions on how to work with, edit, and improve R code. To help you gain a practical understanding, the book also features useful tips on applying these methods to real business problems, including market segmentation and fraud detection. By working through interesting activities, you'll explore data encoders and latent variable models.

By the end of this book, you will have a better understanding of different anomaly detection methods, such as outlier detection, Mahalanobis distances, and contextual and collective anomaly detection.

What you will learn

Implement clustering methods such as k-means, agglomerative, and divisive
Write code in R to analyze market segmentation and consumer behavior
Estimate distribution and probabilities of different outcomes
Implement dimension reduction using principal component analysis
Apply anomaly detection methods to identify fraud
Design algorithms with R and learn how to edit or improve code

Who this book is for

Applied Unsupervised Learning with R is designed for business professionals who want to learn about methods to understand their data better, and developers who have an interest in unsupervised learning. Although the book is for beginners, it will be beneficial to have some basic, beginner-level familiarity with R. This includes an understanding of how to open the R console, how to read data, and how to create a loop. To easily understand the concepts of this book, you should also know basic mathematical concepts, including exponents, square roots, means, and medians.

Foire aux questions

Comment puis-je résilier mon abonnement ?

Il vous suffit de vous rendre dans la section compte dans paramètres et de cliquer sur « Résilier l’abonnement ». C’est aussi simple que cela ! Une fois que vous aurez résilié votre abonnement, il restera actif pour le reste de la période pour laquelle vous avez payé. Découvrez-en plus ici.

Puis-je / comment puis-je télécharger des livres ?

Pour le moment, tous nos livres en format ePub adaptés aux mobiles peuvent être téléchargés via l’application. La plupart de nos PDF sont également disponibles en téléchargement et les autres seront téléchargeables très prochainement. Découvrez-en plus ici.

Quelle est la différence entre les formules tarifaires ?

Les deux abonnements vous donnent un accès complet à la bibliothèque et à toutes les fonctionnalités de Perlego. Les seules différences sont les tarifs ainsi que la période d’abonnement : avec l’abonnement annuel, vous économiserez environ 30 % par rapport à 12 mois d’abonnement mensuel.

Qu’est-ce que Perlego ?

Nous sommes un service d’abonnement à des ouvrages universitaires en ligne, où vous pouvez accéder à toute une bibliothèque pour un prix inférieur à celui d’un seul livre par mois. Avec plus d’un million de livres sur plus de 1 000 sujets, nous avons ce qu’il vous faut ! Découvrez-en plus ici.

Prenez-vous en charge la synthèse vocale ?

Recherchez le symbole Écouter sur votre prochain livre pour voir si vous pouvez l’écouter. L’outil Écouter lit le texte à haute voix pour vous, en surlignant le passage qui est en cours de lecture. Vous pouvez le mettre sur pause, l’accélérer ou le ralentir. Découvrez-en plus ici.

Est-ce que Applied Unsupervised Learning with R est un PDF/ePUB en ligne ?

Oui, vous pouvez accéder à Applied Unsupervised Learning with R par Alok Malik, Bradford Tuckfield en format PDF et/ou ePUB ainsi qu’à d’autres livres populaires dans Ciencia de la computación et Inteligencia artificial (IA) y semántica. Nous disposons de plus d’un million d’ouvrages à découvrir dans notre catalogue.

Informations

Éditeur

Packt Publishing

Année

2019

ISBN

9781789951462

Édition

Sujet

Ciencia de la computación

Sous-sujet

Inteligencia artificial (IA) y semántica

Chapter 1 Introduction to Clustering Methods

Learning Objectives

By the end of this chapter, you will be able to:

Describe the uses of clustering
Perform the k-means algorithm using built-in R libraries
Perform the k-medoids algorithm using built-in R libraries
Determine the optimum number of clusters

In this chapter, we will have a look at the concept of clustering and some basic clustering algorithms.

Introduction

The 21st century is the digital century, where every person on every rung of the economic ladder is using digital devices and producing data in digital format at an unprecedented rate. 90% of data generated in the last 10 years was generated in the last 2 years. This is an exponential rate of growth, where the amount of data is increasing by 10 times every 2 years. This trend is expected to continue for the foreseeable future:

Figure 1.1: Increase in data year on year

Figure 1.1: The increase in digital data year on year

But this data is not just stored in hard drives; it's being used to make lives better. For example, Google uses the data it has to serve you better results, and Netflix uses the data it has to serve you better movie recommendations. In fact, their decision to make their hit show House of Cards was based on analytics. IBM is using the medical data it has to create an artificially intelligent doctor and to detect cancerous tumors from x-ray images.

To process this amount of data with computers and come up with relevant results, a particular class of algorithms is used. These algorithms are collectively known as machine learning algorithms. Machine learning is divided into two parts, depending on the type of data that is being used: one is called supervised learning and the other is called unsupervised learning.

Supervised learning is done when we get labeled data. For example, say we get 1,000 images of x-rays from a hospital that are labeled as normal or fractured. We can use this data to train a machine learning model to predict whether an x-ray image shows a fractured bone or not.

Unsupervised learning is when we just have raw data and are expected to come up with insights without any labels. We have the ability to understand the data and recognize patterns in it without explicitly being told what patterns to identify. By the end of this book, you're going to be aware of all of the major types of unsupervised learning algorithms. In this book, we're going to be using the R programming language for demonstration, but the algorithms are the same for all languages.

In this chapter, we're going to study the most basic type of unsupervised learning, clustering. At first, we're going to study what clustering is, its types, and how to create clusters with any type of dataset. Then we're going to study how each type of clustering works, looking at their advantages and disadvantages. At the end, we're going to learn when to use which type of clustering.

Introduction to Clustering

Clustering is a set of methods or algorithms that are used to find natural groupings according to predefined properties of variables in a dataset. The Merriam-Webster dictionary defines a cluster as "a number of similar things that occur together." Clustering in unsupervised learning is exactly what it means in the traditional sense. For example, how do you identify a bunch of grapes from far away? You have an intuitive sense without looking closely at the bunch whether the grapes are connected to each other or not. Clustering is just like that. An example of clustering is presented here:

Figure 1.2: Representation of two clusters in a dataset

Figure 1.2: A representation of two clusters in a dataset

In the preceding graph, the data points have two properties: cholesterol and blood pressure. The data points are classified into two clusters, or two bunches, according to the Euclidean distance between them. One cluster contains people who are clearly at high risk of heart disease and the other cluster contains people who are at low risk of heart disease. There can be more than two clusters, too, as in the following example:

Figure 1.3: Representation of three clusters in a dataset

Figure 1.3: A representation of three clusters in a dataset

In the preceding graph, there are three clusters. One additional group of people has high blood pressure but with low cholesterol. This group may or may not have a risk of heart disease. In further sections, clustering will be illustrated on real datasets in which the x and y coordinates denote actual quantities.

Uses of Clustering

Like all methods of unsupervised learning, clustering is mostly used when we don't have labeled data – data with predefined classes – for training our models. Clustering uses various properties, such as Euclidean distance and Manhattan distance, to find patterns in the data and classify them according to similarities in their properties without having any labels for training. So, clustering has many use cases in fields where labeled data is unavailable or we want to find patterns that are not defined by labels.

The following are some applications of clustering:

Exploratory data analysis: When we have unlabeled data, we often do clustering to explore the underlying structure and categories of the dataset. For example, a retail store might want to explore how many different segments of customers they have, based on purchase history.
Generating training data: Sometimes, after processing unlabeled data with clustering methods, it can be labeled for further training with supervised learning algorithms. For example, two different classes that are unlabeled might form two entirely different clusters, and using their clusters, we can label data for further supervised learning algorithms that are more efficient in real-time classification than our unsupervised learning algorithms.
Recommender systems: With the help of clustering, we can find the properties of similar items and use these properties to make recommendations. For example, an e-commerce website, after finding customers in the same clusters, can recommend items to customers in that cluster based upon the items bought by other customers in that cluster.
Natural language processing: Clustering can be used for the grouping of similar words, texts, articles, or tweets, without labeled data. For example, you might want to group articles on the same topic automatically.
Anomaly detection: You can use clustering to find outliers. We're going to learn about this in Chapter 6, Anomaly Detection. Anomaly dete...

Table des matières

Preface
Chapter 1
Chapter 2
Chapter 3
Chapter 4
Chapter 5
Chapter 6
Appendix

Normes de citation pour Applied Unsupervised Learning with R

APA 6 Citation

Malik, A., & Tuckfield, B. (2019). Applied Unsupervised Learning with R (1st ed.). Packt Publishing. Retrieved from https://www.perlego.com/book/955535/applied-unsupervised-learning-with-r-uncover-hidden-relationships-and-patterns-with-kmeans-clustering-hierarchical-clustering-and-pca-pdf (Original work published 2019)

Chicago Citation

Malik, Alok, and Bradford Tuckfield. (2019) 2019. Applied Unsupervised Learning with R. 1st ed. Packt Publishing. https://www.perlego.com/book/955535/applied-unsupervised-learning-with-r-uncover-hidden-relationships-and-patterns-with-kmeans-clustering-hierarchical-clustering-and-pca-pdf.

Harvard Citation

Malik, A. and Tuckfield, B. (2019) Applied Unsupervised Learning with R. 1st edn. Packt Publishing. Available at: https://www.perlego.com/book/955535/applied-unsupervised-learning-with-r-uncover-hidden-relationships-and-patterns-with-kmeans-clustering-hierarchical-clustering-and-pca-pdf (Accessed: 14 October 2022).

MLA 7 Citation

Malik, Alok, and Bradford Tuckfield. Applied Unsupervised Learning with R. 1st ed. Packt Publishing, 2019. Web. 14 Oct. 2022.