Computer Science

Unsupervised Learning

Unsupervised learning is a type of machine learning where the model is trained on unlabeled data, without specific guidance or feedback. The goal is to find hidden patterns or structures within the data, such as clustering similar data points or dimensionality reduction. This approach is useful for exploring and understanding complex datasets without predefined categories or labels.

Written by Perlego with AI-assistance

7 Key excerpts on "Unsupervised Learning"

Index pages curate the most relevant extracts from our library of academic textbooks. They’ve been created using an in-house natural language model (NLM), each adding context and meaning to key research topics.
  • Knowledge Discovery in the Social Sciences
    eBook - ePub

    ...PART IV DATA MINING: Unsupervised Learning Chapter 6 CLUSTER ANALYSIS M ACHINE LEARNING REFERS to the ability of computer systems to progressively improve their performance on data analytical tasks by using computer science and statistical techniques to learn from data (Samuel 1959). It builds models and algorithms by learning and improving from data. Machine learning is best for computing tasks in which it is difficult to design models with explicit structure and algorithms with good performance. In other words, the researchers do not have enough information beforehand to design explicit models to specify the relationships among variables or cases. They thus learn from the data to sort out the hidden patterns and structures and learn from these patterns and structures to build models. SUPERVISED LEARNING AND Unsupervised Learning Supervised learning and Unsupervised Learning are the two main types of tasks in machine learning. The main difference between them is that supervised learning starts with knowledge of what the output values for our samples should be whereas Unsupervised Learning does not have explicit outputs to predict. The goal of supervised learning is to find and learn a computational function that best approximates the relationship between some attributes and the output, or outcome variable, in the data. The objective for Unsupervised Learning is to infer and reveal the hidden structure in data. This chapter focuses on Unsupervised Learning. Supervised learning is discussed in part 5. The goal of Unsupervised Learning is to model the underlying structure from data when you have only input data and do not have corresponding outputs, or outcome variables. The process is called Unsupervised Learning because, in the absence of explicit outcomes to predict, no supervision or teaching takes place...

  • Artificial Intelligence for Drug Development, Precision Medicine, and Healthcare

    ...10 Unsupervised Learning 10.1  Needs of Unsupervised Learning Unlike supervised learning, for Unsupervised Learning, there are no correct answers. The goal of Unsupervised Learning is to identify or simplify data structure. Unsupervised Learning is of growing importance in a number of fields; examples are seen when grouping breast cancer patients by their genetic markers, shoppers by their browsing and purchase histories, and movie viewers by the ratings assigned by movie viewers. We may want to organize documents into different mutually exclusive or overlapping categories, or we just might want to visualize the data. It is often easier to obtain unlabeled data than labeled data, which often require human intervention. Unsupervised Learning problems can be further divided into clustering, association, and anomaly detection. A clustering problem occurs when we want to discover the inherent groupings in the data, such as grouping customer by purchasing behavior. An association rule learning problem is one where we want to discover rules that describe connections in large portions of our data, e.g., people who buy product A may also tend to buy product B. The third type of problem, anomaly detection or outlier detection, involves identifying items, events, or observations that do not conform to an expected pattern, such as instances of bank fraud, structural defects, medical problems, or errors in a text. Anomalies are also referred to as outliers, novelties, noise, deviations, and exceptions. In particular, in the context of abuse of computer networks and network intrusion detection, the interesting objects are often not rare objects, but unexpected bursts in activity. This pattern does not adhere to the common statistical definition of an outlier as a rare object, and for this reason there are many outlier detection methods (Zimek, 2017). 10.2  Association or Link Analysis In many situations, finding causality relationships is the goal...

  • Essentials of Deep Learning and AI
    eBook - ePub

    Essentials of Deep Learning and AI

    Experience Unsupervised Learning, Autoencoders, Feature Engineering, and Time Series Analysis with TensorFlow, Keras, and scikit-learn

    • Shashidhar Soppin, B N Chandrashekar, Dr. Manjunath Ramachandra(Authors)
    • 2021(Publication Date)
    • BPB Publications
      (Publisher)

    ...C HAPTER 3 System Analysis with Machine Learning/Un-Supervised Learning W e might have encountered questions such as “why Unsupervised Learning?”, “what is it all about?”, and “how different it is as compared to the supervised learning algorithms and methods?”. In real-world scenarios, many times labeled data may not be available. Data patterns and data format is also not well defined and available to us. In these cases, we go for the Unsupervised Learning mechanisms and algorithms to solve the mystery. The Fashion-MNIST dataset is an example mapped to the labeled classes and can easily be a reference for the supervised learning algorithms. But this same data set can be used without the “labels” for Unsupervised Learning mechanisms and algorithms. As explained in the earlier section, machine learning is defined as one of the branches of computer science in which the algorithms learn from the data or data sets available to them. Most of the machine learning algorithms are designed based on these algorithms and data sets. These predominant algorithms work on labeled data. But there are some algorithms that work without the labeled data. Using unlabeled data, the system can extract the patterns and use them for interpretation. Using unlabeled data is a bit challenging and tricky as compared to simply labeled data sets...

  • Machine Learning
    eBook - ePub

    Machine Learning

    An Algorithmic Perspective, Second Edition

    ...CHAPTER 14 Unsupervised Learning Many of the learning algorithms that we have seen to date have made use of a training set that consists of a collection of labelled target data, or at least (for evolutionary and reinforcement learning) some scoring system that identifies whether or not a prediction is good or not. Targets are obviously useful, since they enable us to show the algorithm the correct answer to possible inputs, but in many circumstances they are difficult to obtain— they could, for instance, involve somebody labelling each instance by hand. In addition, it doesn’t seem to be very biologically plausible: most of the time when we are learning, we don’t get told exactly what the right answer should be. In this chapter we will consider exactly the opposite case, where there is no information about the correct outputs available at all, and the algorithm is left to spot some similarity between different inputs for itself. Unsupervised Learning is a conceptually different problem to supervised learning. Obviously, we can’t hope to perform regression: we don’t know the outputs for any points, so we can’t guess what the function is. Can we hope to do classification then? The aim of classification is to identify similarities between inputs that belong to the same class. There isn’t any information about the correct classes, but if the algorithm can exploit similarities between inputs in order to cluster inputs that are similar together, this might perform classification automatically. So the aim of Unsupervised Learning is to find clusters of similar inputs in the data without being explicitly told that these datapoints belong to one class and those to a different class. Instead, the algorithm has to discover the similarities for itself...

  • Behavior Analysis with Machine Learning Using R

    ...6 Discovering Behaviors with Unsupervised Learning DOI: 10.1201/9781003203469-6 So far, we have been working with supervised learning methods, that is, models for which the training instances have two elements: (1) a set of input values (features) and (2) the expected output (label). As mentioned in chapter 1, there are other types of machine learning methods and one of those is Unsupervised Learning which is the topic of this chapter. In Unsupervised Learning, the training instances do not have a response variable (e.g., a label). Thus, the objective is to extract knowledge from the available data without any type of guidance (supervision). For example, given a set of variables that characterize a person, we would like to find groups of people with similar behaviors. For physical activity behaviors, this could be done by finding groups of very active people versus finding groups of people with low physical activity. Those groups can be useful for delivering targeted suggestions or services thus, enhancing and personalizing the user experience. This chapter starts with one of the most popular Unsupervised Learning algorithms: k -means clustering. Next, an example of how this technique can be applied to find groups of students with similar characteristics is presented. Then, association rules mining is presented, which is another type of Unsupervised Learning method. Finally, association rules are used to find criminal patterns from a homicide database. 6.1 k -means Clustering kmeans_steps.R This is one of the most commonly used unsupervised methods due to its simplicity and efficacy. Its objective is to find groups of points such that points in the same group are similar and points from different groups are as dissimilar as possible. The number of groups k needs to be defined a priori. The method is based on computing distances to centroids. The centroid of a set of points is computed by taking the mean of each of their features...

  • Data Science
    eBook - ePub

    Data Science

    The Executive Summary - A Technical Book for Non-Technical Professionals

    • Field Cady(Author)
    • 2020(Publication Date)
    • Wiley
      (Publisher)

    ...But they make those decisions in fractions of a second, consistently, and at scale. Any time users are presented with a recommendation engine, or software behaves differently to anticipate a somebody's actions, or the computer flags something as being worthy of human attention, the logic making those decisions is liable to be a machine learning model. 5.1 Supervised Learning, Unsupervised Learning, and Binary Classifiers Machine learning falls into two broad categories called “supervised learning” and “Unsupervised Learning.” Both of them are based on finding patterns in historical data. In supervised learning we have something specific (often called the “target” variable) that we are trying to predict about the data, and we know what the right predictions are for our historical data. The goal is specifically to find patterns that can be used to predict the target variable for other data in the future, when we won't know what the right answer is. This is by far the most high‐profile, clearly useful application of machine learning. In Unsupervised Learning there is no specific target variable that we are trying to predict. Instead the goal is to identify latent structures in the data, like the datapoints tending to fall into several natural clusters. Often information like that is not an end in itself, but gets used as input for supervised learning. The simplest and most important type of supervised learning is the binary classifier, where the target variable we are trying to predict is a yes/no label, typically thought of as 1 or 0. Typically the labeled data is divided into “training data” and “testing data.” The algorithm learns how to give the labels by pattern‐matching to the training data, and we measure how well it performs by seeing the labels it gives to the testing data...

  • A Computational Approach to Statistical Learning
    • Taylor Arnold, Michael Kane, Bryan W. Lewis(Authors)
    • 2019(Publication Date)

    ...For example, as we show in Chapter 8, training neural networks with stochastic gradient descent is closer to an art form than a push button algorithm that can be obfuscated from the user. Nearly every chapter in this text shows how understanding the algorithm used to estimate a model often provides essential insight into the model’s use cases and motivation. Additionally, increasingly large data sources have made it difficult or impossible, from a purely computational standpoint, to apply every model to any dataset. Knowledge of the computational details allows one to know exactly what methods are appropriate for a particular scale of data. 1.2 Statistical learning Statistical learning is the process of teaching computers to “learn” by automatically extracting knowledge from available data. It is closely associated with, if not outright synonymous to, the fields of pattern recognition and machine learning. Learning occupies a prominent place within artificial intelligence, which broadly encompasses all forms of computer intelligence, both hand coded and automatically adapted through observed data. We focus in this text on the subfield of supervised learning. The goal is to find patterns in available inputs in order to make accurate predictions on new, unseen data. For this reason models used in supervised learning are often called predictive models. Take the task of building an automatic spam filter. As a starting point, we could label a small dataset of messages by hand. Then, a statistical learning model is built that discovers what features in the messages are indicative of the message being labeled as spam. The model can be used to automatically classify new messages without manual intervention by the user...