The Road to Machine Intelligence
Humans have long imagined mechanical tools that act in a seemingly intelligent way. These ideas permeate the stories we tell, from those going back thousands of years to the ones we enjoy today. The automata created by Hephaestus in Greek myths, the mechanical beings in the Mahabharata and other ancient Hindu texts, the genre‐defining fiction of Isaac Asimov and other writers – humans have always wondered about how inanimate machines might be given independent will that serves (and sometimes threatens) its creators.
When we discuss AI, one important step is delving into just what we mean by intelligence. Human beings have a clear sense of self and a rich internal world. Our decision making and knowledge owes to a pantheon of experience, intuition, superstition, emotion, and all the things that make us thinking creatures. AI as it is today is much narrower in its cognitive potential, accomplishing only what it is designed to do.
AI as a modern scientific field of study and practice emerged in the mid‐twentieth century. It tracked the development of modern computer science, inspired and propelled in part by the work of British computer scientist Alan Turing. In the 1930s, Turing demonstrated mathematically that rules‐based code could solve algorithmic problems, and it was he who developed the eponymous test for interrogating the presence of machine intelligence.
From those beginnings, the field of AI notched up a series of events and inflection points that moved the technology forward. At the 1956 Dartmouth Summer Research Project on Artificial Intelligence, researchers presented what has been dubbed the first AI program, Logic Theorist, and computer scientist John McCarthy coined the term “artificial intelligence.” In the decades after, computer science and computational capabilities both evolved and improved. While there was heady excitement over what AI could potentially accomplish, however, the hardware, software, and algorithms were insufficiently powerful.
Over time, the technology advancements needed for AI, such as computer storage, steadily emerged. In the 1980s, deep learning techniques were devised, opening the door for machine learning (rather than purely rules‐based code). While initially conceived in the 1950s, it took several decades for a type of AI called expert systems to mature. These used symbolic logic, data‐driven processing, and outputs that could be understood beyond the realm of complex mathematics. The excitement was such that by the end of the 1980s, more than half of Fortune 500 companies were creating or using expert systems.2 Yet, for a variety of reasons, including the technical and cognitive limits of expert systems, this avenue of AI fizzled out.
In the 1990s, neural networks received more technical innovation and more effective algorithms. Massively parallel processing also received research attention, seen most publicly in IBM's Deep Blue computer, which in 1997 beat the chess world champion in a six‐game competition. Thus, it took nearly half a century to progress from the origin of the concept of AI to a technology that exceeded human performance in a highly complex activity.
At the turn of the century, the pace of development in computational infrastructure and capabilities quickened. The capabilities in data storage, parallel processing, and the data generation and connectivity permitted by the advent of the Internet all moved toward the computational power needed to make real the loftiest AI ambitions. Continued innovation around artificial neural networks made possible the potential for things like computer vision recognition, wherein a cognitive tool could accurately classify an object in an image. Yet, this type of AI and others like it were flummoxed by a fundamental issue – for machines to learn what an image contained, those images had to be labeled by a human.
For example, if there is a photo of a lion on the African savannah approaching a herd of gazelles, the machine learning tool has no sense of what is what. It does not know which is the lion and which is the gazelle, or even the concept of an animal in the wild. As such, lofty projects set out to hand‐label every object in massive databases of images. This became prohibitively laborious.
Then, in 2011, deep learning emerged in full. Stanford computer scientist Andrew Ng and Google engineer Jeff Dean constructed a neural network, pairing it with a dataset of 10 million images and a cluster of 1,000 machines. They let algorithms process the raw data, and in three days, the cluster had independently created categories for human faces and bodies, as well as cat faces. This was proof that computers could generate feature detectors without labels. It was the advent of unsupervised learning.3
Over the last decade, these and other types of AI have proliferated and are being deployed at scale by organizations across every industry and sector. This has been aided by enormous generation of data through connected devices, flexibility in cloud computing, and the development of critical hardware (e.g., the graphics processing unit). Today, organizations are operating in a period of vigorous innovation and exploration. They seek not just to automate components of the enterprise but to totally reimagine how business is conducted and identify use cases that were never before possible. To be sure, AI is no longer a “nice to have.” It is a competitive necessity.
Basic Terminology in AI
AI is not one thing; it is many things. It is an umbrella term for a variety of models, use cases, and supporting technologies. Importantly, the development of one machine learning technique does not necessarily make another obsolete. Rather, depending on use cases, there are a variety of AI techniques that may be most appropriate.
AI raises a highly technical lexicon that can be opaque to people outside of the data science field. The concepts in AI describe complex mathematics that can leave nontechnical people unsure of how AI actually works. There is no shortage of writing that probes and contests definitions in this evolving field. Yet, we do not need math to grasp the basics of AI. Definitions of relevant and often‐referenced terms include:
- Machine learning (ML) – At its most basic, ML consists of methods for automating algorithmic learning without human participation. The algorithm is supplied with data for training, and it independently “learns” to develop an approach to treating the data (based on whatever function the architect is optimizing). Machine learning methods might use both structured and unstructured data, though data processing for model training may inject some structure.
- Neural network – An NN loosely models how a brain functions, in as much as it uses connected nodes to process and compute data. It is not a distinct physical object but instead the way computations are set up in a virtual space within a computer. An NN contains an input layer, an output layer, and a number of hidden layers between them. Each layer is composed of nodes and connections between nodes that together form a network of layers. Data is inserted into the input layer, computations are autonomously performed between hidden layers, and the algorithm produces an output.
- Deep learning (DL) – A subset of ML, DL is largely (though not exclusively) trained with unstructured, unlabeled data. A DL algorithm uses a neural network to extract features from the data, refine accuracy, and independently adjust when encountering new data. The “deep” in DL refers to the number of layers in an NN. A challenge in DL is that as layers are added to the NN, the level of training error increases, and the task for data scientists is to adjust NN parameters until the algorithm is optimized to deliver an accurate output.
- Supervised learning – In ML, one approach is to feed an algorithm labeled datasets. Humans curate and label the data before model training, and the model is optimized for accuracy with known inputs and outputs. In supervised learning, there are a variety of model types for classification (i.e., sorting data into appropriate categories) and for regression (probing relationships between variables).
- Unsupervised learning – In this case, the training data is largely or entirely unlabeled and unstructured. The datasets are fed to an ML algorithm, and the model identifies patterns within the data, which it uses to reach an output that accurately reflects the real world. An example is the unsupervised learning approach Ng and Dean used in their 2011 image recognition experiment.
- Reinforcement learning – Similar to how humans learn to act based on reward or reprimand, reinforcement learning is the ML approach where an algorithm optimizes its function by calculating an output and gauging the “reward,” what could be simplistically called “trial and error.”
While this list barely scratches the surface of AI vocabulary, it is sufficient for us to think critically about how AI training is conducted, how it can be applied, and where trust and ethics become important.