Practical Machine Learning with Spark
eBook - ePub

Practical Machine Learning with Spark

Uncover Apache Spark's Scalable Performance with High-Quality Algorithms Across NLP, Computer Vision and ML

Gourav Gupta, Dr. Manish Gupta, Dr. Inder Singh Gupta

  1. English
  2. ePUB (apto para móviles)
  3. Disponible en iOS y Android
eBook - ePub

Practical Machine Learning with Spark

Uncover Apache Spark's Scalable Performance with High-Quality Algorithms Across NLP, Computer Vision and ML

Gourav Gupta, Dr. Manish Gupta, Dr. Inder Singh Gupta

Detalles del libro
Vista previa del libro
Índice
Citas

Información del libro

Explore the cosmic secrets of Distributed Processing for Deep Learning applications.

Key Features
?In-depth practical demonstration of ML/DL concepts using Distributed Framework.
? Covers graphical illustrations and visual explanations for ML/DL pipelines.
? Includes live codebase for each of NLP, computer vision and machine learning applications.

Description
This book provides the reader with an up-to-date explanation of Machine Learning and an in-depth, comprehensive, and straightforward understanding of the architectural techniques used to evaluate and anticipate the futuristic insights of data using Apache Spark.The book walks readers by setting up Hadoop and Spark installations on-premises, Docker, and AWS. Readers will learn about Spark MLib and how to utilize it in supervised and unsupervised machine learning scenarios. With the help of Spark, some of the most prominent technologies, such as natural language processing and computer vision, are evaluated and demonstrated in a realistic setting. Using the capabilities of Apache Spark, this book discusses the fundamental components that underlie each of these natural language processing, computer vision, and machine learning technologies, as well as how you can incorporate these technologies into your business processes.Towards the end of the book, readers will learn about several deep learning frameworks, such as TensorFlow and PyTorch. Readers will also learn to execute distributed processing of deep learning problems using the Spark programming language.

What you will learn
? Learn how to get started with machine learning projects using Spark.
? Witness how to use Spark MLib's design for machine learning and deep learning operations.
? Use Spark in tasks involving NLP, unsupervised learning, and computer vision.
? Experiment with Spark in a cloud environment and with AI pipeline workflows.
? Run deep learning applications on a distributed network.

Who this book is for
This book is valuable for data engineers, machine learning engineers, data scientists, data architects, business analysts, and technical consultants worldwide. It would be beneficial to have some familiarity with the fundamentals of Hadoop and Python.

Table of Contents
1.Introduction to Machine Learning
2. Apache Spark Environment Setup and Configuration
3. Apache Spark
4. Apache Spark MLlib
5. Supervised Learning with Spark
6. Un-Supervised Learning with Apache Spark
7. Natural Language Processing with Apache Spark
8. Recommendation Engine with Distributed Framework
9. Deep Learning with Spark
10. Computer Vision with Apache Spark

Preguntas frecuentes

¿Cómo cancelo mi suscripción?
Simplemente, dirígete a la sección ajustes de la cuenta y haz clic en «Cancelar suscripción». Así de sencillo. Después de cancelar tu suscripción, esta permanecerá activa el tiempo restante que hayas pagado. Obtén más información aquí.
¿Cómo descargo los libros?
Por el momento, todos nuestros libros ePub adaptables a dispositivos móviles se pueden descargar a través de la aplicación. La mayor parte de nuestros PDF también se puede descargar y ya estamos trabajando para que el resto también sea descargable. Obtén más información aquí.
¿En qué se diferencian los planes de precios?
Ambos planes te permiten acceder por completo a la biblioteca y a todas las funciones de Perlego. Las únicas diferencias son el precio y el período de suscripción: con el plan anual ahorrarás en torno a un 30 % en comparación con 12 meses de un plan mensual.
¿Qué es Perlego?
Somos un servicio de suscripción de libros de texto en línea que te permite acceder a toda una biblioteca en línea por menos de lo que cuesta un libro al mes. Con más de un millón de libros sobre más de 1000 categorías, ¡tenemos todo lo que necesitas! Obtén más información aquí.
¿Perlego ofrece la función de texto a voz?
Busca el símbolo de lectura en voz alta en tu próximo libro para ver si puedes escucharlo. La herramienta de lectura en voz alta lee el texto en voz alta por ti, resaltando el texto a medida que se lee. Puedes pausarla, acelerarla y ralentizarla. Obtén más información aquí.
¿Es Practical Machine Learning with Spark un PDF/ePUB en línea?
Sí, puedes acceder a Practical Machine Learning with Spark de Gourav Gupta, Dr. Manish Gupta, Dr. Inder Singh Gupta en formato PDF o ePUB, así como a otros libros populares de Ciencia de la computación y Minería de datos. Tenemos más de un millón de libros disponibles en nuestro catálogo para que explores.

Información

Año
2022
ISBN
9789391392086

CHAPTER 1

Introduction to Machine Learning

“Field of study that gives computers the capability to learn without being explicitly programmed.”
— Arthur Samuel

Introduction

Since the last two decades, there has been an incessant enhancement towards the vertical of Artificial Intelligence (AI) and its related sub-branches such as Machine Learning (ML), Statistical Modelling (SM), and Deep Learning (DL). These aforementioned technologies leverage many applications in the amelioration of people’s life and their day-to-day needs in various domains such as bioinformatics, radiology, agriculture, finance, astronomy, banking, healthcare, geo-informatics, seismology, and space exploration. ML extends the core functionality to push-up the capability of manual operations and machine to automatically learn by understanding and observing the key historical experiences. The main objective of this book is to educate the readers about the fundamental, advancement, and real-life applications of ML using a distributed framework. Furthermore, this chapter gives an in-depth knowledge about the journey of AI and the taxonomy of AI. Indeed, the term AI refers to a mimic prototype to imitate intelligent behaviors by understanding the meaningful information, patterns, or inputs. For example, self-driving cars use the concept of AI, especially a vision-based technology for teaching the AI model to make insightful decisions by mimicking and understanding the intelligent behaviors or inputs; these kinds of models are ideal examples of AI. The report shared by Gartner in 2019 depicts that the Intelligent System (IS) and its related verticals will become a big epic-center and most decisive emerging technology in the coming years. In future, almost every tedious problem will be resolved with the help of AI and ML. Across the globe it becomes a subject of interest among researchers, data scientists, data analysts, industrial experts, and academicians for mitigating the herculean real-time problems using AI. Also, this chapter shows the rigorous knowledge about the evolution of ML, types of ML, and its emerging applications with their futuristic scope. In addition, a compendious discussion on DL in connection with AI applications have been embossed in this chapter.

Structure

In this chapter, we will discuss the following topics:
  • Evolution of machine learning
  • Fundamentals and definition of machine learning
  • Types of machine learning algorithms
  • Application of machine learning
  • Future of machine learning

Objectives

After studying this chapter, readers will be able to:
  • Learn about the history of machine learning.
  • Get an understanding of the modern definition of machine learning.
  • Grasp the knowledge of different types of machine learning and its algorithm.
  • Understand the application of machine learning in various fields.
  • Know the future scope of machine learning.

Evolution of Machine Learning

The origin of both technologies AI and ML are interconnected. Hence, for the solid foundation of the readers, detailed history of ML and AI is presented in this section. However, the primary objective of this book is to make the readers conversant with the practical real-time scenario of ML with Apache Spark.
The term ‘Machine Learning’ first came into existence in 1952 after the distinguished work by an American engineer Arthur Samuel. Starting from 1949 to late 1968, he did the pioneering research to learn a computer by applying some instructions into it for making a self-decision. Initially in 1950s, he developed an alpha beta pruning program using a scoring function for measuring winning chances of two-player games like chess, on computers with limited memory. Next, he proposed the minimax algorithm based on the minimax strategy concept along with numerous mechanisms named as “rotelearning” to make his program better. In 1952, Samuel was the first to introduce the term “Machine Learning”. Thereafter, in 1957 Frank Rosenblatt from Cornell Aeronautical Laboratory merged the Donald Hebb’s model of a brain cell with Samuel’s machine learning concept to design the first neural network named perceptron for computers. The Perceptron algorithm was first installed in a machine named Mark 1 perceptron based on IBM704 hardware. It was used for image reconstruction applications and still had some limitations in recognition of the faces patterns.
In 1960s, the new trail was introduced using multi-layers in the neural network [NN], there by providing enhanced capability to solve complex algorithms and provide better precision. After this multi-layer theory, many new capabilities were opened to further improve the neural network learning through the feedforward propagation and back propagation neural networks.
In 1967, the nearest neighbor algorithm came in existence for the basic pattern recognition application for finding the more efficient route for traveling sales persons. In 1970, the back propagation algorithm was developed to adjust the network with hidden layers of neurons for minimizing errors. This algorithm was used to train Deep Neural Network (DNN).
During the 70s and 80s, AI researchers and computer scientists worked together on neural network research, while some of the researchers and engineers started working in ML as a new trail. By the early 1980s, ML and AI took separate paths. AI mainly focused on using logical and knowledge-based approaches while ML focused on neural networks-based algorithms.
In 1990s, ML reached its peak because of availability of large data shared by the Internet service. In 1990, Robert Schapire developed the Boosting Algorithm for ML to minimize the bias during supervise learning with ML algorithms for boosting weak learners. In this, a set of weak learners create a single strong learner and is defined as classifiers that are correlated with true classification. It combines many simple models (weak learners) to generate the result. There are many types of boosting algorithms such as, AdaBoost, BrownBoost, LPBoost, MadaBoost, TotalBoost, xqBoost, and LogitBoost, and AnyBoost. A detailed study on various types of boosting algorithms have been discussed later in this chapter.
Next, in 1996, the IBM Company won the first game against the world champion Garry Kasparov by developing “Deep Blue”, a chess-playing computer. The Deep Blue computer used custom build Very Large-Scale Integration (VLSI) chips for executing the Alpha-Beta algorithm. In 1997, Jurgen Schmidhuber and Sepp Hochreiter designed the neural network model named Long Short-Term Memory (LSTM) for speech recognition training. LSTM consists of cells, input, and output gates and was used for eliminating the gradient problem. In 2006, Face Recognition Algorithms were tested for 3D face scans, face images, and iris images and which was more accurate than the earlier facial recognition algorithms.
In the same year, the Canadian computer scientist Geoffrey Hinton introduced the term Deep Learning (DL) and developed a fast and greedy unsupervised learning algorithm for distinguishing the text and objects in the digital images and videos.
In 2011, the deep learning artificial intelligence research team at Google also known as “Google Brain” developed a large-scale deep learning software system named as DistBelief for learning and categorizing the object in a similar way as a person does. After a year, the Google X team developed ML algorithms containing 16,000 clusters for automatically identifying the cat digital images from YouTube videos.
In 2014, the Facebook research team came up with a facial recognition system known as DeepFace for recognizing human faces in digital images using DL. In 2015, Microsoft developed the ML toolkit for distributed resolution ML problems across multiple computers. In 2016, the Google DeepMind team developed AlphaGo for solving most complex board game problems.
Next in 2017, Google released Google Brain’s second-generation system known as the TensorFlow version 1.0.0 for a single device that can run on both Central Processing Unit (CPU) and Graphics Processing Unit (GPU) for general purpose computing. Recently, Google released the TensorFlow version named TensorFlow.js version 1.0 for ML in JavaScript, TensorFlow 2.0, and TensorFlow Graphics for DL in computer graphics in 2018 and 2019, respectively.

Fundamentals and Definition of Machine Learning

This section focuses on creating a solid foundation of ML starting from its initial definition to its modern definition along with basic terminologies which are essential for grasping the fundamentals of ML. As discussed previously, ML has been adapting and expanding its functionalities in every automation related jobs, so the authors here have put the extra attention towards the core and rational concepts to strengthen the core knowledge of readers on ML. Also, it is necessary to walk through the journey of ML consisting of its importance, the traditional and modern approaches to train a machine or a model for training, validating, and testing of the dataset. This book helps the readers to update them about the real-time challenges and their respective solutions being used in the Intelligence and Analytics-based organizations.
Figure 1.1 depicts the branches of Artificial Intelligence such as Machine Learning, Neural Network, and Deep Learning. In ML, it takes the help of different types of learning concepts such as Supervised Learning (SL), Semi-Supervised Learning (SSL), Unsupervised Learning (USL), and Reinforcement Learning (RL).
Figure 1.1: Artificial Intelligence with its derived technologies
In NN, a special collection of algorithms is used for training, validating, and testing the patterns or inputs by leveraging the ideation of artificial neurons that work a like neurons of a human brain. For example, the conversion of voice-to-text uses the NN as a backbone. Amazon Alexa, Apple Siri, and Google Home are usually known as an ideal application of Smart Personal Assistants. On the flip side, the term DL represents the conglomeration of two or more hidden layers for processing the complex problems with high precision. Generally, DL is like NN, but the only difference is that DL is an easy customization for the complex neural architecture and extends the ease to handle the cumbersome model. These days, there are various DL and NN frameworks available to get on-spot flavor of the initial analytic platform such as Keras, Caffe, and TensorFlow.
In the following section, the reader will elicit about the basic terminologies which are essential to understand the concepts of ML:
  • Features or Attributes or Variables: These are the unique key measurable characteristics of data to be fed into the system for training and testing a model. For ML algorithms, these features are used as inputs or outputs. For recognizing the face of a human being, the associated features such as gender, age, height, lip shape, face shape, and color, so on are to be used as the decisive attributes.
  • Featured Vector or Tuple: It is a group of important features which are listed in a vector or tuple format for training a model.
  • Model: A specific representation learned from data using the ML algorithm. There are three types of models in ML named as Supervised, Unsupervised, and Reinforcement models. It consists of three important phases such as training, validating, and testing of a model.
  • Dataset: A set of informatio...

Índice

  1. Cover Page
  2. Title Page
  3. Copyright Page
  4. Dedication Page
  5. About the Authors
  6. About the Reviewers
  7. Acknowledgements
  8. Preface
  9. Errata
  10. Table of Contents
  11. 1. Introduction to Machine Learning
  12. 2. Apache Spark Environment Setup and Configuration
  13. 3. Apache Spark
  14. 4. Apache Spark MLlib
  15. 5. Supervised Learning with Spark
  16. 6. Un-Supervised Learning with Apache Spark
  17. 7. Natural Language Processing with Apache Spark
  18. 8. Recommendation Engine with Spark
  19. 9. Deep Learning with Spark
  20. 10. Computer Vision with Apache Spark
  21. Index
Estilos de citas para Practical Machine Learning with Spark

APA 6 Citation

Gupta, G., Gupta, M., & Gupta, I. S. (2022). Practical Machine Learning with Spark ([edition unavailable]). BPB Publications. Retrieved from https://www.perlego.com/book/3508430/practical-machine-learning-with-spark-uncover-apache-sparks-scalable-performance-with-highquality-algorithms-across-nlp-computer-vision-and-ml-pdf (Original work published 2022)

Chicago Citation

Gupta, Gourav, Manish Gupta, and Inder Singh Gupta. (2022) 2022. Practical Machine Learning with Spark. [Edition unavailable]. BPB Publications. https://www.perlego.com/book/3508430/practical-machine-learning-with-spark-uncover-apache-sparks-scalable-performance-with-highquality-algorithms-across-nlp-computer-vision-and-ml-pdf.

Harvard Citation

Gupta, G., Gupta, M. and Gupta, I. S. (2022) Practical Machine Learning with Spark. [edition unavailable]. BPB Publications. Available at: https://www.perlego.com/book/3508430/practical-machine-learning-with-spark-uncover-apache-sparks-scalable-performance-with-highquality-algorithms-across-nlp-computer-vision-and-ml-pdf (Accessed: 15 October 2022).

MLA 7 Citation

Gupta, Gourav, Manish Gupta, and Inder Singh Gupta. Practical Machine Learning with Spark. [edition unavailable]. BPB Publications, 2022. Web. 15 Oct. 2022.