Applied Unsupervised Learning with R
eBook - ePub

Applied Unsupervised Learning with R

Uncover hidden relationships and patterns with k-means clustering, hierarchical clustering, and PCA

Alok Malik, Bradford Tuckfield

  1. 320 pagine
  2. English
  3. ePUB (disponibile sull'app)
  4. Disponibile su iOS e Android
eBook - ePub

Applied Unsupervised Learning with R

Uncover hidden relationships and patterns with k-means clustering, hierarchical clustering, and PCA

Alok Malik, Bradford Tuckfield

Dettagli del libro
Anteprima del libro
Indice dei contenuti
Citazioni

Informazioni sul libro

Design clever algorithms that discover hidden patterns and draw responses from unstructured, unlabeled data.

Key Features

  • Build state-of-the-art algorithms that can solve your business' problems
  • Learn how to find hidden patterns in your data
  • Revise key concepts with hands-on exercises using real-world datasets

Book Description

Starting with the basics, Applied Unsupervised Learning with R explains clustering methods, distribution analysis, data encoders, and features of R that enable you to understand your data better and get answers to your most pressing business questions.

This book begins with the most important and commonly used method for unsupervised learning - clustering - and explains the three main clustering algorithms - k-means, divisive, and agglomerative. Following this, you'll study market basket analysis, kernel density estimation, principal component analysis, and anomaly detection. You'll be introduced to these methods using code written in R, with further instructions on how to work with, edit, and improve R code. To help you gain a practical understanding, the book also features useful tips on applying these methods to real business problems, including market segmentation and fraud detection. By working through interesting activities, you'll explore data encoders and latent variable models.

By the end of this book, you will have a better understanding of different anomaly detection methods, such as outlier detection, Mahalanobis distances, and contextual and collective anomaly detection.

What you will learn

  • Implement clustering methods such as k-means, agglomerative, and divisive
  • Write code in R to analyze market segmentation and consumer behavior
  • Estimate distribution and probabilities of different outcomes
  • Implement dimension reduction using principal component analysis
  • Apply anomaly detection methods to identify fraud
  • Design algorithms with R and learn how to edit or improve code

Who this book is for

Applied Unsupervised Learning with R is designed for business professionals who want to learn about methods to understand their data better, and developers who have an interest in unsupervised learning. Although the book is for beginners, it will be beneficial to have some basic, beginner-level familiarity with R. This includes an understanding of how to open the R console, how to read data, and how to create a loop. To easily understand the concepts of this book, you should also know basic mathematical concepts, including exponents, square roots, means, and medians.

Domande frequenti

Come faccio ad annullare l'abbonamento?
È semplicissimo: basta accedere alla sezione Account nelle Impostazioni e cliccare su "Annulla abbonamento". Dopo la cancellazione, l'abbonamento rimarrà attivo per il periodo rimanente già pagato. Per maggiori informazioni, clicca qui
È possibile scaricare libri? Se sì, come?
Al momento è possibile scaricare tramite l'app tutti i nostri libri ePub mobile-friendly. Anche la maggior parte dei nostri PDF è scaricabile e stiamo lavorando per rendere disponibile quanto prima il download di tutti gli altri file. Per maggiori informazioni, clicca qui
Che differenza c'è tra i piani?
Entrambi i piani ti danno accesso illimitato alla libreria e a tutte le funzionalità di Perlego. Le uniche differenze sono il prezzo e il periodo di abbonamento: con il piano annuale risparmierai circa il 30% rispetto a 12 rate con quello mensile.
Cos'è Perlego?
Perlego è un servizio di abbonamento a testi accademici, che ti permette di accedere a un'intera libreria online a un prezzo inferiore rispetto a quello che pagheresti per acquistare un singolo libro al mese. Con oltre 1 milione di testi suddivisi in più di 1.000 categorie, troverai sicuramente ciò che fa per te! Per maggiori informazioni, clicca qui.
Perlego supporta la sintesi vocale?
Cerca l'icona Sintesi vocale nel prossimo libro che leggerai per verificare se è possibile riprodurre l'audio. Questo strumento permette di leggere il testo a voce alta, evidenziandolo man mano che la lettura procede. Puoi aumentare o diminuire la velocità della sintesi vocale, oppure sospendere la riproduzione. Per maggiori informazioni, clicca qui.
Applied Unsupervised Learning with R è disponibile online in formato PDF/ePub?
Sì, puoi accedere a Applied Unsupervised Learning with R di Alok Malik, Bradford Tuckfield in formato PDF e/o ePub, così come ad altri libri molto apprezzati nelle sezioni relative a Computer Science e Artificial Intelligence (AI) & Semantics. Scopri oltre 1 milione di libri disponibili nel nostro catalogo.

Informazioni

Anno
2019
ISBN
9781789951462

Chapter 1

Introduction to Clustering Methods

Learning Objectives

By the end of this chapter, you will be able to:
  • Describe the uses of clustering
  • Perform the k-means algorithm using built-in R libraries
  • Perform the k-medoids algorithm using built-in R libraries
  • Determine the optimum number of clusters
In this chapter, we will have a look at the concept of clustering and some basic clustering algorithms.

Introduction

The 21st century is the digital century, where every person on every rung of the economic ladder is using digital devices and producing data in digital format at an unprecedented rate. 90% of data generated in the last 10 years was generated in the last 2 years. This is an exponential rate of growth, where the amount of data is increasing by 10 times every 2 years. This trend is expected to continue for the foreseeable future:
Figure 1.1: Increase in data year on year
Figure 1.1: The increase in digital data year on year
But this data is not just stored in hard drives; it's being used to make lives better. For example, Google uses the data it has to serve you better results, and Netflix uses the data it has to serve you better movie recommendations. In fact, their decision to make their hit show House of Cards was based on analytics. IBM is using the medical data it has to create an artificially intelligent doctor and to detect cancerous tumors from x-ray images.
To process this amount of data with computers and come up with relevant results, a particular class of algorithms is used. These algorithms are collectively known as machine learning algorithms. Machine learning is divided into two parts, depending on the type of data that is being used: one is called supervised learning and the other is called unsupervised learning.
Supervised learning is done when we get labeled data. For example, say we get 1,000 images of x-rays from a hospital that are labeled as normal or fractured. We can use this data to train a machine learning model to predict whether an x-ray image shows a fractured bone or not.
Unsupervised learning is when we just have raw data and are expected to come up with insights without any labels. We have the ability to understand the data and recognize patterns in it without explicitly being told what patterns to identify. By the end of this book, you're going to be aware of all of the major types of unsupervised learning algorithms. In this book, we're going to be using the R programming language for demonstration, but the algorithms are the same for all languages.
In this chapter, we're going to study the most basic type of unsupervised learning, clustering. At first, we're going to study what clustering is, its types, and how to create clusters with any type of dataset. Then we're going to study how each type of clustering works, looking at their advantages and disadvantages. At the end, we're going to learn when to use which type of clustering.

Introduction to Clustering

Clustering is a set of methods or algorithms that are used to find natural groupings according to predefined properties of variables in a dataset. The Merriam-Webster dictionary defines a cluster as "a number of similar things that occur together." Clustering in unsupervised learning is exactly what it means in the traditional sense. For example, how do you identify a bunch of grapes from far away? You have an intuitive sense without looking closely at the bunch whether the grapes are connected to each other or not. Clustering is just like that. An example of clustering is presented here:
Figure 1.2: Representation of two clusters in a dataset
Figure 1.2: A representation of two clusters in a dataset
In the preceding graph, the data points have two properties: cholesterol and blood pressure. The data points are classified into two clusters, or two bunches, according to the Euclidean distance between them. One cluster contains people who are clearly at high risk of heart disease and the other cluster contains people who are at low risk of heart disease. There can be more than two clusters, too, as in the following example:
Figure 1.3: Representation of three clusters in a dataset
Figure 1.3: A representation of three clusters in a dataset
In the preceding graph, there are three clusters. One additional group of people has high blood pressure but with low cholesterol. This group may or may not have a risk of heart disease. In further sections, clustering will be illustrated on real datasets in which the x and y coordinates denote actual quantities.

Uses of Clustering

Like all methods of unsupervised learning, clustering is mostly used when we don't have labeled data – data with predefined classes – for training our models. Clustering uses various properties, such as Euclidean distance and Manhattan distance, to find patterns in the data and classify them according to similarities in their properties without having any labels for training. So, clustering has many use cases in fields where labeled data is unavailable or we want to find patterns that are not defined by labels.
The following are some applications of clustering:
  • Exploratory data analysis: When we have unlabeled data, we often do clustering to explore the underlying structure and categories of the dataset. For example, a retail store might want to explore how many different segments of customers they have, based on purchase history.
  • Generating training data: Sometimes, after processing unlabeled data with clustering methods, it can be labeled for further training with supervised learning algorithms. For example, two different classes that are unlabeled might form two entirely different clusters, and using their clusters, we can label data for further supervised learning algorithms that are more efficient in real-time classification than our unsupervised learning algorithms.
  • Recommender systems: With the help of clustering, we can find the properties of similar items and use these properties to make recommendations. For example, an e-commerce website, after finding customers in the same clusters, can recommend items to customers in that cluster based upon the items bought by other customers in that cluster.
  • Natural language processing: Clustering can be used for the grouping of similar words, texts, articles, or tweets, without labeled data. For example, you might want to group articles on the same topic automatically.
  • Anomaly detection: You can use clustering to find outliers. We're going to learn about this in Chapter 6, Anomaly Detection. Anomaly dete...

Indice dei contenuti

  1. Preface
  2. Chapter 1
  3. Chapter 2
  4. Chapter 3
  5. Chapter 4
  6. Chapter 5
  7. Chapter 6
  8. Appendix
Stili delle citazioni per Applied Unsupervised Learning with R

APA 6 Citation

Malik, A., & Tuckfield, B. (2019). Applied Unsupervised Learning with R (1st ed.). Packt Publishing. Retrieved from https://www.perlego.com/book/955535/applied-unsupervised-learning-with-r-uncover-hidden-relationships-and-patterns-with-kmeans-clustering-hierarchical-clustering-and-pca-pdf (Original work published 2019)

Chicago Citation

Malik, Alok, and Bradford Tuckfield. (2019) 2019. Applied Unsupervised Learning with R. 1st ed. Packt Publishing. https://www.perlego.com/book/955535/applied-unsupervised-learning-with-r-uncover-hidden-relationships-and-patterns-with-kmeans-clustering-hierarchical-clustering-and-pca-pdf.

Harvard Citation

Malik, A. and Tuckfield, B. (2019) Applied Unsupervised Learning with R. 1st edn. Packt Publishing. Available at: https://www.perlego.com/book/955535/applied-unsupervised-learning-with-r-uncover-hidden-relationships-and-patterns-with-kmeans-clustering-hierarchical-clustering-and-pca-pdf (Accessed: 14 October 2022).

MLA 7 Citation

Malik, Alok, and Bradford Tuckfield. Applied Unsupervised Learning with R. 1st ed. Packt Publishing, 2019. Web. 14 Oct. 2022.