Deep Learning in Visual Computing
eBook - ePub

Deep Learning in Visual Computing

Explanations and Examples

  1. 134 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

Deep Learning in Visual Computing

Explanations and Examples

Book details
Book preview
Table of contents
Citations

About This Book

Deep learning is an artificially intelligent entity that teaches itself and can be utilized to make predictions. Deep learning mimics the human brain and provides learned solutions addressing many challenging problems in the area of visual computing. From object recognition to image classification for diagnostics, deep learning has shown the power of artificial deep neural networks in solving real world visual computing problems with super-human accuracy. The introduction of deep learning into the field of visual computing has meant to be the death of most of the traditional image processing and computer vision techniques. Today, deep learning is considered to be the most powerful, accurate, efficient and effective method with the potential to solve many of the most challenging problems in visual computing.

This book provides an insight into deep machine learning and the challenges in visual computing to tackle the novel method of machine learning. It introduces readers to the world of deep neural network architectures with easy-to-understand explanations. From face recognition to image classification for diagnosis of cancer, the book provides unique examples of solved problems in applied visual computing using deep learning. Interested and enthusiastic readers of modern machine learning methods will find this book easy to follow. They will find it a handy guide for designing and implementing their own projects in the field of visual computing.

Frequently asked questions

Simply head over to the account section in settings and click on “Cancel Subscription” - it’s as simple as that. After you cancel, your membership will stay active for the remainder of the time you’ve paid for. Learn more here.
At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.
Both plans give you full access to the library and all of Perlego’s features. The only differences are the price and subscription period: With the annual plan you’ll save around 30% compared to 12 months on the monthly plan.
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Yes, you can access Deep Learning in Visual Computing by Hassan Ugail in PDF and/or ePUB format, as well as other popular books in Computer Science & Computer Science General. We have over one million books available in our catalogue for you to explore.

Information

Publisher
CRC Press
Year
2022
ISBN
9781000625455
Edition
1

CHAPTER 1 Introduction

In September 2017, Apple released its iPhone X. With it came the Face ID—Apple’s advanced facial recognition-based biometric system-embedded within the hardware and the software of the phone. Shortly, I managed to lay my hands on an iPhone X. To use facial recognition as an identification system—and to gain access to various Apps, including the App that lets me into my bank account—I first had to enroll my face. The enrolling process involved following a simple set of instructions where I had to present my face at certain poses and angles to the phone’s camera. Within a few minutes of enrolling my face, I was able to start enjoying the convenience of simply looking at my phone to unlock it and use it.
Onto another story. A few years back, I was on a regular walk at the local park with my son, who was 4-year-old then. Children with their small size, large eyes, chubby cheeks, and a display of innocence and exuberance on their faces, appear to naturally attract adults’ attention and affection. As we were strolling along a small pathway in the park, we encountered a middle-aged lady who briefly stopped and said “Hello” to my son, while giving him a social smile. My son gently returned the greeting. Since then, we often see her in the local park, and we exchange regular greetings. My son probably will never forget her face. The most common visual object we humans process is the face. The human brain has an amazing ability to process faces, to recognise them—often decades following the first glimpse of a given face.
There is one key parallel one could draw between how the human brain distinguishes faces and how an algorithmically based face recognition system such as Apple’s Face ID works. For a human, recognising a face is a trivial task. Though all the faces are similar—with two eyes, a nose, and a mouth in a uniform configuration—we can still identify and recognise faces in many different angles and lighting conditions, and we do this often with ease. It is believed that humans perform image recognition through a template matching process in which the perceived objects—for example, faces—are stored in a long term memory in the form of a discrete set of features that truly represent a given object. And, when a new object is presented, its features can be compared with those existing in the long term memory to retrieve the best match. Thus, two distinct processes can be identified here, the first being the learning process in which the brain teaches itself about specific objects through features. For example, in the story of my son at the park, his brain learnt the face of the lady he met in the park. Then, in the second process, the input face is compared to the stored face templates to find an exact match.
Interestingly, Apple’s Face ID identifies and discriminates between faces in much the same way as we humans identify and recognise faces in our brain. At the enrolling stage, Face ID reads the face from various positions and angles and ultimately converts the physical features of the face into a mathematical representation—often this would be a few hundred floating-point numbers per face. For recognition, the infrared sensor on the phone reads the face, and computes the mathematical representation corresponding to the face and makes a “distance-wise” comparison with the stored template face(s). Again, one can see, like the brain, the face matching system first learns about the face, and then it is compared to the template faces to see if there is a match verifying a given threshold value of accuracy.
The fundamental idea behind object recognition–both in the computer world and the human brain—is information compression or the automatic formation of useful representations from data. Foundationally, we can refer to this process of representational learning as deep learning. A good example of this is the way we often represent the structure of an atom, as shown in Figure 1.1. Whilst an atom contains electrons that surround a nucleus composed of protons and neutrons, its structure is never as formative, as shown in Figure 1.1. In fact, the position of an electron at any given time is never predictable, and only a probability estimate of where it is located around the nucleus is what we can obtain. However, presenting the structure of an atom, as shown in Figure 1.1 helps us to form a piece of representational learning about atoms, molecules, and macroscopic matter.
The brain of a human contains well over 100 billion neurons. Together these neurons build a colossal network that is parallel and distributed. Functional activities such as seeing and interpreting images are carried using these networks. Similarly, in the digital algorithmic world, artificial neural architectures can be created to mimic how the brain learns and performs complex tasks. In a visual computing setting, as an example, one can imagine providing millions and millions of images to a machine learning algorithm as training data. The algorithm figures out the unique patterns in the image data and ultimately puts them into specific categories through representational learning. The important point to make a note of here is that the neural network algorithm need not be explicitly coded to detect certain features visible or classifiable by a human. Rather, the algorithm is left to its own devices to figure out the complex patterns and classify the data appropriately. Thus, with sufficient data, the neural network learns about the objects on its own through a training and testing process.
Figure 1.1: A structural representation of the carbon atom.
At a very high level, deep neural networks are composed of encoders and decoders. The function of the encoder is to find useful patterns in the data and come up with a representational form of learning. Similarly, the function of the decoder is to generate high-resolution data from the representations where the generated data is new examples or expressive knowledge. And, fundamentally, this is how deep learning works.
With the recent explosion of powerful computational tools and sensor data, deep learning has become a tool that is central to machine learning. Deep learning has found its use in great arrays and varieties. Virtual assistants, such as Siri and Alexa, biometric face recognition, chatbots, image analysis tools, and computer-aided diagnostic systems, are being actively driven by deep learning-assisted models and algorithms. In fact, the entire field of artificial intelligence has recently taken centre stage due to the explosive developments in deep learning algorithms and the availability of pre-trained deep learning models. Furthermore, much of the problems in the domain of visual computing are being addressed using deep learning methods and tools. From basic image analysis tasks to complex disease diagnosis exercises, deep learning has taken a central role in almost all visual computing-related problems. Therefore, a book solely on the accumulated knowledge and recent developments in deep learning in the domain of visual computing deserves to be authored.
This book outlines the methods and techniques of deep learning applied to problems arising from the domain of visual computing. The first section of this book is devoted to explaining deep learning—starting from the fundamental concepts to the mechanisms of crafting deep learning models. The book covers sufficient theory on deep learning for the user to grasp the essential elements in deep learning. To supplement the theory and knowledge on deep learning, the book also covers several practical examples to demonstrate how it can be successfully utilised in the realm of visual computing.

CHAPTER 2 The Foundations of Deep Learning

This chapter provides a brief introductory material on deep learning. We provide the fundamental concepts and techniques one must bear in mind for understanding deep learning from the point of view of attempting to solve problems in visual computing.

Introduction

Artificial intelligence (AI) is defined to be a technique that enables a machine or an algorithm to mimic human behaviour. Much of the engine of artificial intelligence is driven by methods and techniques of Machine Learning (ML) with deep learning being part of it. Deep learning allows computational models to learn and represent data in a manner mimicking how the human brain perceives and understands multi-modal information. Though deep learning has become very popular only recently, its history is rather long.
Deep learning is a mechanism by which a machine algorithm can learn by example. It is a mechanism to obtain an optimal configuration to a model so that the desired output can be obtained from a set of input data. Mathematically, one can think of this process as obtaining the desired function g(y1, y2,..., yn) that maps to an input function f(x1, x2,..., xm). There exist many ways one can obtain such relationships—starting from linear approximations to making use of complex non-linear forms. Often, explicit mathematical relationships between variables associated with physical laws or phenomena can be formulated. For example, if we take the well-known Newton’s second law of motion which states that the force F applied to an object of mass m is the product of m and the acceleration, which is the rate at which the velocity of the object changes. This is simply formulated as a linear relationship such that F = ma. Therefore, given the input values of m and a, one could easily predict the force that must be exerted. Similarly, if we take Einstein’s famous equation, E = mc2, we can see that a non-linear relationship exists between the energy of a piece of matter m and the speed of light c.
For a selected set of physical phenomena, and for more straightforward real-life situations such correlations between variables can be inferred which can then be explicitly written down in a linear or a non-linear form. However, much of the real-world problems are too complicated or too dynamic to be able to find such explicit relationships between the associated variables.
Consider the relationship between the input and output of a system as shown graphically in Figure 2.1. The input and output relationship, shown in Figure 2.1 (a), is clearly linear which can be modelled using a function of the form F(x) = ax + b, where a and b are feature values (or parameters) in the model space. In some sense, the function F(x) with specific values a and b being able to closely approximate the dataset in Figure 2.1(a) is a learned model. If the input and output pattern of the model now changes, as shown in Figure 2.1(b), the linear model no longer is valid. However, we can resort to a model of the form F(x) = ax3 + bx2 + cx + d or F(x) = CeAx (where a, b, c, d and C and A are the parameters of the corresponding non-linear model) which would now be more suitable for modelling the data shown in Figure 2.1(b).
Though the linear and non-linear models are suitable for modelling simple datasets, for real-life applications, this sort of functional modelling is challenging to obtain. For example, for a visual computing task like identifying an apple in an image, seeking a linear combination of pixels that can map the image of an apple is unattainable. Even with a simple combination of non-linear functions, such a task can be daunting where there could be several huge and often conflicting parameters to deal with. As a result, one must look for a generic methodology through which complex and multidimensional datasets that apply to real- life applications can be modelled. This is one area whereby machine learning, especially the technique of deep learning, can help us. Such models can take sufficiently large data as input and process them to extract the general patterns in the data using high-level information or knowledge. They are geared to mimic how humans infer objects and how humans extract knowledge from them. As a result, a plausible avenue for teaching a machine about the world is to mimic the human brain—particularly how the neurons utilise data for learning and inferring.
Figure 2.1: Illustration of the structure of a perceptron and how it functions.

The Perceptron

The first machine learning computational model, inspired by the neurons in the brain, was proposed by Warren McColloch and Walter Pitts, both of whom worked in the field of computational neuroscience back in the 1940s. They wanted to understand how the human brain could produce intricate patterns by using interconnected neurons. They tried to use it as a foundation for their proposed perceptron model, initially proposed by Rosenblatt in 1958, who was working at the Cornell Aeronautical Laboratory at the time.
The McColloch and Pitts perceptron is a mathematical model based on the working concept of a biological neuron. Neurons are interconnected nerve cells within the brain and are collectively known as neural networks. They function by processing and transmitting information via electrical and chemical signals. A biological neuron is stimulated when an action potential is generated because of a change in the ion concentration across the cell membrane. Generally, there is a higher concentration of sodium ions in the extracellular space while there is a higher concentration of potassium ions within the intracellular space. During an action potential, ions are transported back and forth across the neuronal membranes, causing an electrical change that transmits the nerve impulse.
Similar to the functioning of a human neuron, the perceptron neural model receives a series of incoming signals x1, x2, x3, which would either be excitatory or inhibitory. The signals can be weighted through w1, w2, w3, so that their effect can be adjusted as necessary. Finally, the weighted sum can be computed. If the weighted sum of the incoming signals is at a chosen threshold, the model gives an output of 1, if not the output is 0. Thus, the model bases its output ...

Table of contents

  1. Cover Page
  2. Title Page
  3. Copyright Page
  4. Dedication
  5. Preface
  6. Acknowledgements
  7. Table of Contents
  8. 1. Introduction
  9. 2. The Foundations of Deep Learning
  10. 3. Deep Learning Models for Visual Computing
  11. 4. Deep Face Recognition
  12. 5. Age Estimation from Face Images using Deep Learning
  13. 6. The Nose and Ethnicity
  14. 7. Analysis of Skin Bums using Deep Learning
  15. 8. Deep Learning Approaches to Cancer Diagnosis using Histopathological Images
  16. 9. A Deep Transfer Learning Model for the Analysis of Electrocardiograms
  17. 10. Advances in Visual Computing through Deep Learning
  18. 11. Frontiers and Challenges in Deep Learning for Visual Computing
  19. Index
  20. About the Author