Applied Unsupervised Learning with R
eBook - ePub

Applied Unsupervised Learning with R

Uncover hidden relationships and patterns with k-means clustering, hierarchical clustering, and PCA

Alok Malik, Bradford Tuckfield

  1. 320 páginas
  2. English
  3. ePUB (apto para móviles)
  4. Disponible en iOS y Android
eBook - ePub

Applied Unsupervised Learning with R

Uncover hidden relationships and patterns with k-means clustering, hierarchical clustering, and PCA

Alok Malik, Bradford Tuckfield

Detalles del libro
Vista previa del libro
Índice
Citas

Información del libro

Design clever algorithms that discover hidden patterns and draw responses from unstructured, unlabeled data.

Key Features

  • Build state-of-the-art algorithms that can solve your business' problems
  • Learn how to find hidden patterns in your data
  • Revise key concepts with hands-on exercises using real-world datasets

Book Description

Starting with the basics, Applied Unsupervised Learning with R explains clustering methods, distribution analysis, data encoders, and features of R that enable you to understand your data better and get answers to your most pressing business questions.

This book begins with the most important and commonly used method for unsupervised learning - clustering - and explains the three main clustering algorithms - k-means, divisive, and agglomerative. Following this, you'll study market basket analysis, kernel density estimation, principal component analysis, and anomaly detection. You'll be introduced to these methods using code written in R, with further instructions on how to work with, edit, and improve R code. To help you gain a practical understanding, the book also features useful tips on applying these methods to real business problems, including market segmentation and fraud detection. By working through interesting activities, you'll explore data encoders and latent variable models.

By the end of this book, you will have a better understanding of different anomaly detection methods, such as outlier detection, Mahalanobis distances, and contextual and collective anomaly detection.

What you will learn

  • Implement clustering methods such as k-means, agglomerative, and divisive
  • Write code in R to analyze market segmentation and consumer behavior
  • Estimate distribution and probabilities of different outcomes
  • Implement dimension reduction using principal component analysis
  • Apply anomaly detection methods to identify fraud
  • Design algorithms with R and learn how to edit or improve code

Who this book is for

Applied Unsupervised Learning with R is designed for business professionals who want to learn about methods to understand their data better, and developers who have an interest in unsupervised learning. Although the book is for beginners, it will be beneficial to have some basic, beginner-level familiarity with R. This includes an understanding of how to open the R console, how to read data, and how to create a loop. To easily understand the concepts of this book, you should also know basic mathematical concepts, including exponents, square roots, means, and medians.

Preguntas frecuentes

¿Cómo cancelo mi suscripción?
Simplemente, dirígete a la sección ajustes de la cuenta y haz clic en «Cancelar suscripción». Así de sencillo. Después de cancelar tu suscripción, esta permanecerá activa el tiempo restante que hayas pagado. Obtén más información aquí.
¿Cómo descargo los libros?
Por el momento, todos nuestros libros ePub adaptables a dispositivos móviles se pueden descargar a través de la aplicación. La mayor parte de nuestros PDF también se puede descargar y ya estamos trabajando para que el resto también sea descargable. Obtén más información aquí.
¿En qué se diferencian los planes de precios?
Ambos planes te permiten acceder por completo a la biblioteca y a todas las funciones de Perlego. Las únicas diferencias son el precio y el período de suscripción: con el plan anual ahorrarás en torno a un 30 % en comparación con 12 meses de un plan mensual.
¿Qué es Perlego?
Somos un servicio de suscripción de libros de texto en línea que te permite acceder a toda una biblioteca en línea por menos de lo que cuesta un libro al mes. Con más de un millón de libros sobre más de 1000 categorías, ¡tenemos todo lo que necesitas! Obtén más información aquí.
¿Perlego ofrece la función de texto a voz?
Busca el símbolo de lectura en voz alta en tu próximo libro para ver si puedes escucharlo. La herramienta de lectura en voz alta lee el texto en voz alta por ti, resaltando el texto a medida que se lee. Puedes pausarla, acelerarla y ralentizarla. Obtén más información aquí.
¿Es Applied Unsupervised Learning with R un PDF/ePUB en línea?
Sí, puedes acceder a Applied Unsupervised Learning with R de Alok Malik, Bradford Tuckfield en formato PDF o ePUB, así como a otros libros populares de Ciencia de la computación y Inteligencia artificial (IA) y semántica. Tenemos más de un millón de libros disponibles en nuestro catálogo para que explores.

Información

Chapter 1

Introduction to Clustering Methods

Learning Objectives

By the end of this chapter, you will be able to:
  • Describe the uses of clustering
  • Perform the k-means algorithm using built-in R libraries
  • Perform the k-medoids algorithm using built-in R libraries
  • Determine the optimum number of clusters
In this chapter, we will have a look at the concept of clustering and some basic clustering algorithms.

Introduction

The 21st century is the digital century, where every person on every rung of the economic ladder is using digital devices and producing data in digital format at an unprecedented rate. 90% of data generated in the last 10 years was generated in the last 2 years. This is an exponential rate of growth, where the amount of data is increasing by 10 times every 2 years. This trend is expected to continue for the foreseeable future:
Figure 1.1: Increase in data year on year
Figure 1.1: The increase in digital data year on year
But this data is not just stored in hard drives; it's being used to make lives better. For example, Google uses the data it has to serve you better results, and Netflix uses the data it has to serve you better movie recommendations. In fact, their decision to make their hit show House of Cards was based on analytics. IBM is using the medical data it has to create an artificially intelligent doctor and to detect cancerous tumors from x-ray images.
To process this amount of data with computers and come up with relevant results, a particular class of algorithms is used. These algorithms are collectively known as machine learning algorithms. Machine learning is divided into two parts, depending on the type of data that is being used: one is called supervised learning and the other is called unsupervised learning.
Supervised learning is done when we get labeled data. For example, say we get 1,000 images of x-rays from a hospital that are labeled as normal or fractured. We can use this data to train a machine learning model to predict whether an x-ray image shows a fractured bone or not.
Unsupervised learning is when we just have raw data and are expected to come up with insights without any labels. We have the ability to understand the data and recognize patterns in it without explicitly being told what patterns to identify. By the end of this book, you're going to be aware of all of the major types of unsupervised learning algorithms. In this book, we're going to be using the R programming language for demonstration, but the algorithms are the same for all languages.
In this chapter, we're going to study the most basic type of unsupervised learning, clustering. At first, we're going to study what clustering is, its types, and how to create clusters with any type of dataset. Then we're going to study how each type of clustering works, looking at their advantages and disadvantages. At the end, we're going to learn when to use which type of clustering.

Introduction to Clustering

Clustering is a set of methods or algorithms that are used to find natural groupings according to predefined properties of variables in a dataset. The Merriam-Webster dictionary defines a cluster as "a number of similar things that occur together." Clustering in unsupervised learning is exactly what it means in the traditional sense. For example, how do you identify a bunch of grapes from far away? You have an intuitive sense without looking closely at the bunch whether the grapes are connected to each other or not. Clustering is just like that. An example of clustering is presented here:
Figure 1.2: Representation of two clusters in a dataset
Figure 1.2: A representation of two clusters in a dataset
In the preceding graph, the data points have two properties: cholesterol and blood pressure. The data points are classified into two clusters, or two bunches, according to the Euclidean distance between them. One cluster contains people who are clearly at high risk of heart disease and the other cluster contains people who are at low risk of heart disease. There can be more than two clusters, too, as in the following example:
Figure 1.3: Representation of three clusters in a dataset
Figure 1.3: A representation of three clusters in a dataset
In the preceding graph, there are three clusters. One additional group of people has high blood pressure but with low cholesterol. This group may or may not have a risk of heart disease. In further sections, clustering will be illustrated on real datasets in which the x and y coordinates denote actual quantities.

Uses of Clustering

Like all methods of unsupervised learning, clustering is mostly used when we don't have labeled data – data with predefined classes – for training our models. Clustering uses various properties, such as Euclidean distance and Manhattan distance, to find patterns in the data and classify them according to similarities in their properties without having any labels for training. So, clustering has many use cases in fields where labeled data is unavailable or we want to find patterns that are not defined by labels.
The following are some applications of clustering:
  • Exploratory data analysis: When we have unlabeled data, we often do clustering to explore the underlying structure and categories of the dataset. For example, a retail store might want to explore how many different segments of customers they have, based on purchase history.
  • Generating training data: Sometimes, after processing unlabeled data with clustering methods, it can be labeled for further training with supervised learning algorithms. For example, two different classes that are unlabeled might form two entirely different clusters, and using their clusters, we can label data for further supervised learning algorithms that are more efficient in real-time classification than our unsupervised learning algorithms.
  • Recommender systems: With the help of clustering, we can find the properties of similar items and use these properties to make recommendations. For example, an e-commerce website, after finding customers in the same clusters, can recommend items to customers in that cluster based upon the items bought by other customers in that cluster.
  • Natural language processing: Clustering can be used for the grouping of similar words, texts, articles, or tweets, without labeled data. For example, you might want to group articles on the same topic automatically.
  • Anomaly detection: You can use clustering to find outliers. We're going to learn about this in Chapter 6, Anomaly Detection. Anomaly dete...

Índice

  1. Preface
  2. Chapter 1
  3. Chapter 2
  4. Chapter 3
  5. Chapter 4
  6. Chapter 5
  7. Chapter 6
  8. Appendix
Estilos de citas para Applied Unsupervised Learning with R

APA 6 Citation

Malik, A., & Tuckfield, B. (2019). Applied Unsupervised Learning with R (1st ed.). Packt Publishing. Retrieved from https://www.perlego.com/book/955535/applied-unsupervised-learning-with-r-uncover-hidden-relationships-and-patterns-with-kmeans-clustering-hierarchical-clustering-and-pca-pdf (Original work published 2019)

Chicago Citation

Malik, Alok, and Bradford Tuckfield. (2019) 2019. Applied Unsupervised Learning with R. 1st ed. Packt Publishing. https://www.perlego.com/book/955535/applied-unsupervised-learning-with-r-uncover-hidden-relationships-and-patterns-with-kmeans-clustering-hierarchical-clustering-and-pca-pdf.

Harvard Citation

Malik, A. and Tuckfield, B. (2019) Applied Unsupervised Learning with R. 1st edn. Packt Publishing. Available at: https://www.perlego.com/book/955535/applied-unsupervised-learning-with-r-uncover-hidden-relationships-and-patterns-with-kmeans-clustering-hierarchical-clustering-and-pca-pdf (Accessed: 14 October 2022).

MLA 7 Citation

Malik, Alok, and Bradford Tuckfield. Applied Unsupervised Learning with R. 1st ed. Packt Publishing, 2019. Web. 14 Oct. 2022.