Applied Unsupervised Learning with R
eBook - ePub

Applied Unsupervised Learning with R

Uncover hidden relationships and patterns with k-means clustering, hierarchical clustering, and PCA

Alok Malik, Bradford Tuckfield

  1. 320 Seiten
  2. English
  3. ePUB (handyfreundlich)
  4. Über iOS und Android verfügbar
eBook - ePub

Applied Unsupervised Learning with R

Uncover hidden relationships and patterns with k-means clustering, hierarchical clustering, and PCA

Alok Malik, Bradford Tuckfield

Angaben zum Buch
Buchvorschau
Inhaltsverzeichnis
Quellenangaben

Über dieses Buch

Design clever algorithms that discover hidden patterns and draw responses from unstructured, unlabeled data.

Key Features

  • Build state-of-the-art algorithms that can solve your business' problems
  • Learn how to find hidden patterns in your data
  • Revise key concepts with hands-on exercises using real-world datasets

Book Description

Starting with the basics, Applied Unsupervised Learning with R explains clustering methods, distribution analysis, data encoders, and features of R that enable you to understand your data better and get answers to your most pressing business questions.

This book begins with the most important and commonly used method for unsupervised learning - clustering - and explains the three main clustering algorithms - k-means, divisive, and agglomerative. Following this, you'll study market basket analysis, kernel density estimation, principal component analysis, and anomaly detection. You'll be introduced to these methods using code written in R, with further instructions on how to work with, edit, and improve R code. To help you gain a practical understanding, the book also features useful tips on applying these methods to real business problems, including market segmentation and fraud detection. By working through interesting activities, you'll explore data encoders and latent variable models.

By the end of this book, you will have a better understanding of different anomaly detection methods, such as outlier detection, Mahalanobis distances, and contextual and collective anomaly detection.

What you will learn

  • Implement clustering methods such as k-means, agglomerative, and divisive
  • Write code in R to analyze market segmentation and consumer behavior
  • Estimate distribution and probabilities of different outcomes
  • Implement dimension reduction using principal component analysis
  • Apply anomaly detection methods to identify fraud
  • Design algorithms with R and learn how to edit or improve code

Who this book is for

Applied Unsupervised Learning with R is designed for business professionals who want to learn about methods to understand their data better, and developers who have an interest in unsupervised learning. Although the book is for beginners, it will be beneficial to have some basic, beginner-level familiarity with R. This includes an understanding of how to open the R console, how to read data, and how to create a loop. To easily understand the concepts of this book, you should also know basic mathematical concepts, including exponents, square roots, means, and medians.

Häufig gestellte Fragen

Wie kann ich mein Abo kündigen?
Gehe einfach zum Kontobereich in den Einstellungen und klicke auf „Abo kündigen“ – ganz einfach. Nachdem du gekündigt hast, bleibt deine Mitgliedschaft für den verbleibenden Abozeitraum, den du bereits bezahlt hast, aktiv. Mehr Informationen hier.
(Wie) Kann ich Bücher herunterladen?
Derzeit stehen all unsere auf Mobilgeräte reagierenden ePub-Bücher zum Download über die App zur Verfügung. Die meisten unserer PDFs stehen ebenfalls zum Download bereit; wir arbeiten daran, auch die übrigen PDFs zum Download anzubieten, bei denen dies aktuell noch nicht möglich ist. Weitere Informationen hier.
Welcher Unterschied besteht bei den Preisen zwischen den Aboplänen?
Mit beiden Aboplänen erhältst du vollen Zugang zur Bibliothek und allen Funktionen von Perlego. Die einzigen Unterschiede bestehen im Preis und dem Abozeitraum: Mit dem Jahresabo sparst du auf 12 Monate gerechnet im Vergleich zum Monatsabo rund 30 %.
Was ist Perlego?
Wir sind ein Online-Abodienst für Lehrbücher, bei dem du für weniger als den Preis eines einzelnen Buches pro Monat Zugang zu einer ganzen Online-Bibliothek erhältst. Mit über 1 Million Büchern zu über 1.000 verschiedenen Themen haben wir bestimmt alles, was du brauchst! Weitere Informationen hier.
Unterstützt Perlego Text-zu-Sprache?
Achte auf das Symbol zum Vorlesen in deinem nächsten Buch, um zu sehen, ob du es dir auch anhören kannst. Bei diesem Tool wird dir Text laut vorgelesen, wobei der Text beim Vorlesen auch grafisch hervorgehoben wird. Du kannst das Vorlesen jederzeit anhalten, beschleunigen und verlangsamen. Weitere Informationen hier.
Ist Applied Unsupervised Learning with R als Online-PDF/ePub verfügbar?
Ja, du hast Zugang zu Applied Unsupervised Learning with R von Alok Malik, Bradford Tuckfield im PDF- und/oder ePub-Format sowie zu anderen beliebten Büchern aus Ciencia de la computación & Inteligencia artificial (IA) y semántica. Aus unserem Katalog stehen dir über 1 Million Bücher zur Verfügung.

Chapter 1

Introduction to Clustering Methods

Learning Objectives

By the end of this chapter, you will be able to:
  • Describe the uses of clustering
  • Perform the k-means algorithm using built-in R libraries
  • Perform the k-medoids algorithm using built-in R libraries
  • Determine the optimum number of clusters
In this chapter, we will have a look at the concept of clustering and some basic clustering algorithms.

Introduction

The 21st century is the digital century, where every person on every rung of the economic ladder is using digital devices and producing data in digital format at an unprecedented rate. 90% of data generated in the last 10 years was generated in the last 2 years. This is an exponential rate of growth, where the amount of data is increasing by 10 times every 2 years. This trend is expected to continue for the foreseeable future:
Figure 1.1: Increase in data year on year
Figure 1.1: The increase in digital data year on year
But this data is not just stored in hard drives; it's being used to make lives better. For example, Google uses the data it has to serve you better results, and Netflix uses the data it has to serve you better movie recommendations. In fact, their decision to make their hit show House of Cards was based on analytics. IBM is using the medical data it has to create an artificially intelligent doctor and to detect cancerous tumors from x-ray images.
To process this amount of data with computers and come up with relevant results, a particular class of algorithms is used. These algorithms are collectively known as machine learning algorithms. Machine learning is divided into two parts, depending on the type of data that is being used: one is called supervised learning and the other is called unsupervised learning.
Supervised learning is done when we get labeled data. For example, say we get 1,000 images of x-rays from a hospital that are labeled as normal or fractured. We can use this data to train a machine learning model to predict whether an x-ray image shows a fractured bone or not.
Unsupervised learning is when we just have raw data and are expected to come up with insights without any labels. We have the ability to understand the data and recognize patterns in it without explicitly being told what patterns to identify. By the end of this book, you're going to be aware of all of the major types of unsupervised learning algorithms. In this book, we're going to be using the R programming language for demonstration, but the algorithms are the same for all languages.
In this chapter, we're going to study the most basic type of unsupervised learning, clustering. At first, we're going to study what clustering is, its types, and how to create clusters with any type of dataset. Then we're going to study how each type of clustering works, looking at their advantages and disadvantages. At the end, we're going to learn when to use which type of clustering.

Introduction to Clustering

Clustering is a set of methods or algorithms that are used to find natural groupings according to predefined properties of variables in a dataset. The Merriam-Webster dictionary defines a cluster as "a number of similar things that occur together." Clustering in unsupervised learning is exactly what it means in the traditional sense. For example, how do you identify a bunch of grapes from far away? You have an intuitive sense without looking closely at the bunch whether the grapes are connected to each other or not. Clustering is just like that. An example of clustering is presented here:
Figure 1.2: Representation of two clusters in a dataset
Figure 1.2: A representation of two clusters in a dataset
In the preceding graph, the data points have two properties: cholesterol and blood pressure. The data points are classified into two clusters, or two bunches, according to the Euclidean distance between them. One cluster contains people who are clearly at high risk of heart disease and the other cluster contains people who are at low risk of heart disease. There can be more than two clusters, too, as in the following example:
Figure 1.3: Representation of three clusters in a dataset
Figure 1.3: A representation of three clusters in a dataset
In the preceding graph, there are three clusters. One additional group of people has high blood pressure but with low cholesterol. This group may or may not have a risk of heart disease. In further sections, clustering will be illustrated on real datasets in which the x and y coordinates denote actual quantities.

Uses of Clustering

Like all methods of unsupervised learning, clustering is mostly used when we don't have labeled data – data with predefined classes – for training our models. Clustering uses various properties, such as Euclidean distance and Manhattan distance, to find patterns in the data and classify them according to similarities in their properties without having any labels for training. So, clustering has many use cases in fields where labeled data is unavailable or we want to find patterns that are not defined by labels.
The following are some applications of clustering:
  • Exploratory data analysis: When we have unlabeled data, we often do clustering to explore the underlying structure and categories of the dataset. For example, a retail store might want to explore how many different segments of customers they have, based on purchase history.
  • Generating training data: Sometimes, after processing unlabeled data with clustering methods, it can be labeled for further training with supervised learning algorithms. For example, two different classes that are unlabeled might form two entirely different clusters, and using their clusters, we can label data for further supervised learning algorithms that are more efficient in real-time classification than our unsupervised learning algorithms.
  • Recommender systems: With the help of clustering, we can find the properties of similar items and use these properties to make recommendations. For example, an e-commerce website, after finding customers in the same clusters, can recommend items to customers in that cluster based upon the items bought by other customers in that cluster.
  • Natural language processing: Clustering can be used for the grouping of similar words, texts, articles, or tweets, without labeled data. For example, you might want to group articles on the same topic automatically.
  • Anomaly detection: You can use clustering to find outliers. We're going to learn about this in Chapter 6, Anomaly Detection. Anomaly dete...

Inhaltsverzeichnis

  1. Preface
  2. Chapter 1
  3. Chapter 2
  4. Chapter 3
  5. Chapter 4
  6. Chapter 5
  7. Chapter 6
  8. Appendix
Zitierstile für Applied Unsupervised Learning with R

APA 6 Citation

Malik, A., & Tuckfield, B. (2019). Applied Unsupervised Learning with R (1st ed.). Packt Publishing. Retrieved from https://www.perlego.com/book/955535/applied-unsupervised-learning-with-r-uncover-hidden-relationships-and-patterns-with-kmeans-clustering-hierarchical-clustering-and-pca-pdf (Original work published 2019)

Chicago Citation

Malik, Alok, and Bradford Tuckfield. (2019) 2019. Applied Unsupervised Learning with R. 1st ed. Packt Publishing. https://www.perlego.com/book/955535/applied-unsupervised-learning-with-r-uncover-hidden-relationships-and-patterns-with-kmeans-clustering-hierarchical-clustering-and-pca-pdf.

Harvard Citation

Malik, A. and Tuckfield, B. (2019) Applied Unsupervised Learning with R. 1st edn. Packt Publishing. Available at: https://www.perlego.com/book/955535/applied-unsupervised-learning-with-r-uncover-hidden-relationships-and-patterns-with-kmeans-clustering-hierarchical-clustering-and-pca-pdf (Accessed: 14 October 2022).

MLA 7 Citation

Malik, Alok, and Bradford Tuckfield. Applied Unsupervised Learning with R. 1st ed. Packt Publishing, 2019. Web. 14 Oct. 2022.