The Unsupervised Learning Workshop
eBook - ePub

The Unsupervised Learning Workshop

Aaron Jones, Christopher Kruger, Benjamin Johnston

  1. 550 Seiten
  2. English
  3. ePUB (handyfreundlich)
  4. Über iOS und Android verfügbar
eBook - ePub

The Unsupervised Learning Workshop

Aaron Jones, Christopher Kruger, Benjamin Johnston

Angaben zum Buch
Buchvorschau
Inhaltsverzeichnis
Quellenangaben

Über dieses Buch

Learning how to apply unsupervised algorithms on unlabeled datasets from scratch can be easier than you thought with this beginner's workshop, featuring interesting examples and activitiesKey Features• Get familiar with the ecosystem of unsupervised algorithms• Learn interesting methods to simplify large amounts of unorganized data• Tackle real-world challenges, such as estimating the population density of a geographical areaBook DescriptionDo you find it difficult to understand how popular companies like WhatsApp and Amazon find valuable insights from large amounts of unorganized data? The Unsupervised Learning Workshop will give you the confidence to deal with cluttered and unlabeled datasets, using unsupervised algorithms in an easy and interactive manner.The book starts by introducing the most popular clustering algorithms of unsupervised learning. You'll find out how hierarchical clustering differs from k-means, along with understanding how to apply DBSCAN to highly complex and noisy data. Moving ahead, you'll use autoencoders for efficient data encoding.As you progress, you'll use t-SNE models to extract high-dimensional information into a lower dimension for better visualization, in addition to working with topic modeling for implementing natural language processing (NLP). In later chapters, you'll find key relationships between customers and businesses using Market Basket Analysis, before going on to use Hotspot Analysis for estimating the population density of an area.By the end of this book, you'll be equipped with the skills you need to apply unsupervised algorithms on cluttered datasets to find useful patterns and insights.What you will learn• Distinguish between hierarchical clustering and the k-means algorithm• Understand the process of finding clusters in data• Grasp interesting techniques to reduce the size of data• Use autoencoders to decode data• Extract text from a large collection of documents using topic modeling• Create a bag-of-words model using the CountVectorizerWho this book is forIf you are a data scientist who is just getting started and want to learn how to implement machine learning algorithms to build predictive models, then this book is for you. To expedite the learning process, a solid understanding of the Python programming language is recommended, as you'll be editing classes and functions instead of creating them from scratch.

Häufig gestellte Fragen

Wie kann ich mein Abo kündigen?
Gehe einfach zum Kontobereich in den Einstellungen und klicke auf „Abo kündigen“ – ganz einfach. Nachdem du gekündigt hast, bleibt deine Mitgliedschaft für den verbleibenden Abozeitraum, den du bereits bezahlt hast, aktiv. Mehr Informationen hier.
(Wie) Kann ich Bücher herunterladen?
Derzeit stehen all unsere auf Mobilgeräte reagierenden ePub-Bücher zum Download über die App zur Verfügung. Die meisten unserer PDFs stehen ebenfalls zum Download bereit; wir arbeiten daran, auch die übrigen PDFs zum Download anzubieten, bei denen dies aktuell noch nicht möglich ist. Weitere Informationen hier.
Welcher Unterschied besteht bei den Preisen zwischen den Aboplänen?
Mit beiden Aboplänen erhältst du vollen Zugang zur Bibliothek und allen Funktionen von Perlego. Die einzigen Unterschiede bestehen im Preis und dem Abozeitraum: Mit dem Jahresabo sparst du auf 12 Monate gerechnet im Vergleich zum Monatsabo rund 30 %.
Was ist Perlego?
Wir sind ein Online-Abodienst für Lehrbücher, bei dem du für weniger als den Preis eines einzelnen Buches pro Monat Zugang zu einer ganzen Online-Bibliothek erhältst. Mit über 1 Million Büchern zu über 1.000 verschiedenen Themen haben wir bestimmt alles, was du brauchst! Weitere Informationen hier.
Unterstützt Perlego Text-zu-Sprache?
Achte auf das Symbol zum Vorlesen in deinem nächsten Buch, um zu sehen, ob du es dir auch anhören kannst. Bei diesem Tool wird dir Text laut vorgelesen, wobei der Text beim Vorlesen auch grafisch hervorgehoben wird. Du kannst das Vorlesen jederzeit anhalten, beschleunigen und verlangsamen. Weitere Informationen hier.
Ist The Unsupervised Learning Workshop als Online-PDF/ePub verfügbar?
Ja, du hast Zugang zu The Unsupervised Learning Workshop von Aaron Jones, Christopher Kruger, Benjamin Johnston im PDF- und/oder ePub-Format sowie zu anderen beliebten Büchern aus Informatique & Intelligence artificielle (IA) et sémantique. Aus unserem Katalog stehen dir über 1 Million Bücher zur Verfügung.

Information

1. Introduction to Clustering

Overview
Finding insights and value in data is the ambitious promise that has been seen in the rise of machine learning. Within machine learning, there are predictive approaches to understanding dense information in deeper ways, as well as approaches to predicting outcomes based on changing inputs. In this chapter, we will learn what supervised learning and unsupervised learning are, and how they are applied to different use cases. Once you have a deeper understanding of where unsupervised learning is useful, we will walk through some foundational techniques that provide value quickly.
By the end of this chapter, you will be able to implement k-means clustering algorithms using built-in Python packages and calculate the silhouette score.

Introduction

Have you ever been asked to take a look at some data and came up empty handed? Maybe you weren't familiar with the dataset, or maybe you didn't even know where to start. This may have been extremely frustrating, and even embarrassing, depending on who asked you to take care of the task.
You are not alone, and, interestingly enough, there are many times the data itself is simply too confusing to be made sense of. As you try and figure out what all those numbers in your spreadsheet mean, you're most likely mimicking what many unsupervised algorithms do when they try to find meaning in data. The reality is that many unprocessed real-world datasets may not have any useful insights. One example to consider is the fact that these days, individuals generate massive amounts of granular data on a daily basis – whether it's their actions on a website, their purchase history, or what apps they use on their phone. If you were to look at this information on the surface, it would be a big, unorganized mess with no hope of clarity. Don't fret, however; this book will prepare you for such tall tasks so that you'll never be frustrated again when dealing with data exploration tasks, no matter how large.
For this book, we have developed some best-in-class content to help you understand how unsupervised algorithms work and where to use them. We'll cover some of the foundations of finding clusters in your data, how to reduce the size of your data so it's easier to understand, and how each of these sides of unsupervised learning can be applied in the real world. We hope you will come away from this book with a strong real-world understanding of unsupervised learning, the problems that it can solve, and those it cannot.

Unsupervised Learning versus Supervised Learning

Unsupervised learning is the field of practice that helps find patterns in cluttered data and is one of the most exciting areas of development in machine learning today. If you have explored machine learning bookwork before, you are probably familiar with the common breakout of problems in either supervised or unsupervised learning. Supervised learning encompasses the problem set of having a labeled dataset that can be used to either classify data (for example, predicting smokers and non-smokers, if you're looking at a lung health dataset) or finding a pattern in clearly defined data (for example, predicting the sale price of a home based on how many bedrooms it has). This model most closely mirrors an intuitive human approach to learning.
For example, if you wanted to learn how to not burn your food with a basic understanding of cooking, you could build a dataset by putting your food on the burner and seeing how long it takes (input) for your food to burn (output). Eventually, as you continue to burn your food, you will build a mental model of when burning will occur and how to avoid it in the future. Development in supervised learning was once fast paced and valuable, but it has simmered down in recent years. Many of the obstacles around getting to know your data have already been tackled and are listed in the following image:
Figure 1.1: Differences between unsupervised and supervised learning
Figure 1.1: Differences between unsupervised and supervised learning
Conversely, unsupervised learning encompasses the problem set of having a tremendous amount of data that is unlabeled. Labeled data, in this case, would be data that has a supplied "target" outcome that you are trying to find the correlation to with supplied data. For instance, in the preceding example, you know that your "target outcome" is whether your food was burned; this is an example of labeled data. Unlabeled data is when you do not know what the "target" outcome is, and you have only supplied input data.
Building upon the previous example, imagine you were just dropped on planet Earth with zero knowledge of how cooking works. You are given 100 days, a stove, and a fridge full of food without any instructions on what to do. Your initial exploration of a kitchen could go in infinite directions. On day 10, you may finally learn how to open the fridge; on day 30, you may learn that food can go on the stove; and after many more days, you may unwittingly make an edible meal. As you can see, trying to find meaning in a kitchen devoid of adequate informational structure leads to very noisy data that is completely irrelevant to actually preparing a meal.
Unsupervised learning can be an answer to this problem. Looking back at your 100 days of data, you can use clustering to find patterns of similar attributes across days and deduce which foods are similar and may lead to a "good" meal. However, unsupervised learning isn't a magical answer. Simply finding clusters can be just as likely to help you find pockets of similar, yet ultimately useless, data. Expanding on the cooking example, we can illustrate this shortcoming with the concept of the "third variable". Just because you have a cluster of really great recipes doesn't mean they are infallible. During your research, you may have found a unifying factor that all good meals were cooked on a stove. This does not mean that every meal cooked on a stove will be good, and you cannot easily jump to that conclusion for all future scenarios.
This challenge is what makes unsupervised learning so exciting. How can we find smarter techniques to speed up the process of finding clusters of information that are beneficial to our end goals? The following sections would help us answer this question.

Clustering

Clustering is the overarching process that involves finding groups of similar data that exist in your dataset, which can be extremely valuable if you are trying to find its underlying meaning. If you were a store owner and you wanted to understand which customers are more valuable without a set idea of what valuable is, clustering would be a great place to start to find patterns in your data. You may have a few high-level ideas of what denotes a valuable customer, but you aren't entirely sure in the face of a large mountain of available data. Through clustering, you can find commonalities among similar groups in your data. For example, if you look more deeply at a cluster of similar people, you may learn that everyone in that group visits your website for longer periods of time than others. This can show you what the value is and also provide a clean sample size for future supervised learning experiments.

Identifying Clusters

The following image shows two scatterplots:
Figure 1.2: Two distinct scatterplots
Figure 1.2: Two distinct scatterplots
The following image sep...

Inhaltsverzeichnis

  1. The Unsupervised Learning Workshop
  2. Preface
  3. 1. Introduction to Clustering
  4. 2. Hierarchical Clustering
  5. 3. Neighborhood Approaches and DBSCAN
  6. 4. Dimensionality Reduction Techniques and PCA
  7. 5. Autoencoders
  8. 6. t-Distributed Stochastic Neighbor Embedding
  9. 7. Topic Modeling
  10. 8. Market Basket Analysis
  11. 9. Hotspot Analysis
  12. Appendix
Zitierstile für The Unsupervised Learning Workshop

APA 6 Citation

Jones, A., Kruger, C., & Johnston, B. (2020). The Unsupervised Learning Workshop (1st ed.). Packt Publishing. Retrieved from https://www.perlego.com/book/1694595/the-unsupervised-learning-workshop-pdf (Original work published 2020)

Chicago Citation

Jones, Aaron, Christopher Kruger, and Benjamin Johnston. (2020) 2020. The Unsupervised Learning Workshop. 1st ed. Packt Publishing. https://www.perlego.com/book/1694595/the-unsupervised-learning-workshop-pdf.

Harvard Citation

Jones, A., Kruger, C. and Johnston, B. (2020) The Unsupervised Learning Workshop. 1st edn. Packt Publishing. Available at: https://www.perlego.com/book/1694595/the-unsupervised-learning-workshop-pdf (Accessed: 14 October 2022).

MLA 7 Citation

Jones, Aaron, Christopher Kruger, and Benjamin Johnston. The Unsupervised Learning Workshop. 1st ed. Packt Publishing, 2020. Web. 14 Oct. 2022.