TWENTY-FIRST CENTURY CHAPTER 7
CHATBOTS AND THE ILLUSION OF AWARENESS
âThe future of computer power is pure simplicity.â
âDouglas Adams, author of The Hitchhikerâs Guide to the Galaxy
âAlexa, letâs chat.â
Itâs November 24, 2017, at Amazonâs Day 1 headquarters in Seattle. I follow my escort through security and upstairs to the fifth floor, where Iâm ushered into a room partitioned with walls of floor-to-ceiling black curtains. Brushing past the curtains, I take my seat in a modern wingback chair at the center of the room. On a small table is a writing tablet, a pen, and a pair of Sony studio headphones, as well as a matchbox-sized device that has a single button. At my left, a round silver reflector bounces illumination from the blindingly bright studio light set in front of me. In the darkness beyond that, I can just make out the figure of my videographer perched behind his tripod and digital camera, waiting patiently for our session to begin. I don the headphones and we run through a quick sound check. Though I canât see them, I know there are two other judges who, like me, are sitting in similar ad hoc mini studios.
The voice of our interactor comes through my headphones loud and clear as he resumes this round of the finals with the trigger phrase: âAlexa, letâs chat.â This is immediately followed by the now familiar computer-generated voice of Alexa, the virtual digital assistant software that runs on an ever-growing number of Amazon-supported appliances.
âWelcome to the Alexa Prize Grand Finale. Hereâs one of the finalists. Hello! This is an Alexa Prize socialbot. I think I have already heard your voice. Have we talked before?â
âWe have,â replies the interactor, one of three people selected to engage the competing socialbots in conversation according to specific guidelines.
âWhat a faithful chatmate you are. What was your name again?â
âMike.â
âYeah, I think I remember you, Mike. But letâs talk about you. How is your day going? Would you share with me some of your todayâs terrific experiences?â
We are in the midst of the first Alexa Prize, Amazonâs $2.5 million competition designed to spur innovation in the field of conversational AI technology. Alexa is a voice service that runs programs known as skills. Each of these teamâs conversational socialbots is one such skill. By 2019, there were more than 90,000 Alexa skills, from fitness trainers to smart home controllers to a version of the television game show Jeopardy! There are now hundreds of third-party devices with Alexa built in, plus Amazonâs ever-growing list of devices, including the Amazon Echo on which it was first introduced to the public. More than 60,000 smart home products can be controlled with Alexa, including lighting, ceiling fans, thermostats, and smart TVs.
More than one hundred university teams from twenty-two countries applied to compete in the first year of the Alexa Prize competition, from which fifteen were eventually selected. After working on their socialbots for much of the year, including training them through interactions with the public, the competition is now down to the remaining three finalist teams. In this first round of the finals, three judges, including me, are grading conversations between a human interactor and each teamâs socialbot. It is the job of the interactor to engage Alexa in what will hopefully be a coherent and interesting conversation. Once a judge feels a socialbot has become too repetitive or nonsensical, they press the small button at their table. After two judges drop the conversation, the conversation is stopped. The goal for each team is to keep the conversation going for as long as they canâideally at least twenty minutes, the threshold for winning the grand prize. There are three separate rounds, with a different interactor and three different judges in each round. All of the socialbots and teams remain anonymous to the judges and interactors throughout the finals.
Though all of the socialbots speak with the same computer-generated voice, each team is running very different software. The programs sail through or stumble over different things, but overall, they are all surprisingly capable. In the end, Sounding Board, a team from the University of Washington, is the winner of a $500,000 prize to be split among themselves. Their team is made up of five doctoral students with expertise in language generation, deep learning, reinforcement learning, psychologically infused natural language processing, and human-AI collaboration, with guidance from three electrical engineering and computer science professors. Sounding Board managed to maintain an engaging conversation for an average of ten minutes and twenty-two seconds, shy of the twenty-minute threshold, though one of its conversations came very close. Had their socialbot managed to surpass this hurdle with two of the interactors and achieved an average rating from the judges of 4.0, they would have won an additional $1 million for their university.
Alexa is far from the only digital assistant in the world. All of the giants of AI are developing their own unique version of this potentially transformative interface. Microsoftâs Cortana has been a presence on the Windows platform since 2015, beginning with Windows 10 and expanding to numerous other devices. Appleâs Siri was originally released in 2011 on the iPhone 4S, and was subsequently made available on all of their iOS devices.
Drawing on their immense search engine infrastructure, Google developed Assistant, which was released in 2016. Google Assistant does a good job answering questions and performing tasks, responding quickly to spoken requests, but as with its competitors, it still has some way to go.
In 2018, Google Assistant became more accomplished with the integration of Google Duplex,1 an extension that allowed Assistant to autonomously place a call and schedule appointments and reservations with a person at the other end of the line. For the most part, the voice that Duplex uses is indistinguishable from a human voice, just one of the features that wowed listeners when it was first demonstrated in May 2018.
Woman: Hello, how can I help you?
Duplex: Hi, Iâm calling to book a womanâs haircut for a client? Umm, Iâm looking for something on May third.
Woman: Sure, give me one second. . . . Sure, what time are you looking for around?
Duplex: At 12 p.m.
Woman: We do not have anything at 12 p.m. available. The closest we have to that is a 1:15.
Duplex: Do you have anything between 10 a.m. and uhh, 12 p.m.?
Woman: Depending on what service she would like. What service is she looking for?
Duplex: Just a womanâs haircut for now.
Woman: Okay, we have a 10 oâclock.
Duplex: 10:00 a.m. is fine.
Woman: Okay, whatâs her first name?
Duplex: The first name is Lisa.
Woman: Okay, perfect. So, I will see Lisa at 10 oâclock on May third.
Duplex: Okay, great. Thanks.
Woman: Great. Have a great day. Bye.
People were amazed not only at the ability of the program to negotiate the complexities of human conversation, but also how lifelike the speech was. This was realistic and nuanced, right down to the cadence, pauses, and filler words, such as âummâ and âuh.â However, the demo raised serious questions almost immediately. The lack of identification of the business or person answering made some people wonder if the demo was canned or faked. For others, the fact that Duplex sounded so much like a human, yet hadnât identified itself as a bot, was of far greater concern, given the ethical issues it raised as well as the potential for abuse. Responding to this, the next month Google gave a second, lower-key demo that addressed most of these issues, especially the matter of identifying itself as âGoogleâs automated booking serviceâ at the beginning of the call. As impressive as the program was, at the time of the demos, Duplex could only make haircut appointments and restaurant reservations and answer inquiries about business hours. Additionally, subsequent reports indicated portions of these exchanges were passed over to a human operator at some stage, presumably to further train the system. Nevertheless, Google clearly demonstrated the direction they want to take this technology.
Not to be outdone, in China, AI giant Baidu has developed DuerOS, a conversational AI system for use on devices, including its own Xiaodu and Raven smart speakers. DuerOS intelligent chips can be integrated into virtually any device to provide the functionality of its assistant software. Multinational conglomerate Tencent has developed Xiaowei, a digital assistant that integrates with their WeChat app, a multipurpose messaging, mobile payment, and social media app used by more than a billion people around the world. E-commerce giant Alibaba had enormous incentive to develop its digital assistant, Cainiao, given that the company delivers more than one billion packages a day. While Cainiao is currently being used only for coordinating package deliveries, it will no doubt expand to deal with many different tasks, including the fifty thousand customer service calls the company receives each day.
Clearly, we have reached an inflection point in the development of these digital assistants, but how did we get here? The field of artificial intelligence has a long history with chatbots, which is essentially what these teams are building. The development of these programs goes well beyond a fascination for talking computers. In his 1950 paper, âComputing Machinery and Intelligence,â2 mathematician, cryptanalyst, and computer scientist Alan Turing laid out his premise for what he called âThe Imitation Game,â in which a computer attempts to deceive a human judge into thinking it is actually a person. In time, this and other variants described in Turingâs paper became known as the Turing test,3 and computer scientists, researchers, and hobbyists have sought to meet the challenge, either by attempting to create genuinely intelligent programs or through artifice, by using gimmicks to try to fool the human judges.
ELIZA is generally considered the first chatbot program, having been completed in 1966. Developed by MIT AI professor Joseph Weizenbaum, ELIZA was based on a series of scripts that directed the programâs responses according to the previous input statement provided by the person it was communicating with. Using its most famous script, DOCTOR, ELIZA would respond much as a Rogerian psychotherapist might. (Rogerian psychotherapy later became more commonly known as person-centered therapy.) For instance, a typical dialogue might go something like:
ELIZA: Is something troubling you?
Person: Men are all alike.
ELIZA: What is the connection, do you suppose?
Person: Theyâre always bugging us about something or other.
ELIZA: Can you think of a specific example?
Person: Well, my boyfriend made me come here.
ELIZA: Is it important to you that your boyfriend made you come here?
Person: He says Iâm depressed much of the time.
ELIZA: I am sorry to hear that youâre depressed.
Person: Itâs true. I am unhappy.
ELIZA: Can you explain what made you unhappy?
From our vantage point, this feels like a relatively logical if somewhat stilted conversation, but to a programmer it is a fairly straightforward script, built around a rules-based engine. The statements produced by the computer work by using pattern matching and substitution, incorporating a series of cue words or phrases extracted from the userâs prior response.
Ironically, Weizenbaum created the program to demonstrate what he felt was the superficiality of computer communications, but he was surprised by the response of many of the people who engaged with it. Weizenbaum even relates the story of his secretary conversing with the program. Following a few exchanges, she asked Weizenbaum to leave the room so she could be alone with the computer! As he wrote in 1976, âI had not realized . . . that extremely short exposures to a relatively simple computer program could induce powerful delusional thinking in quite normal people.â4 Needless to say, Weizenbaum was successful, though not in the way he had originally intended.
In the years that followed, chatbot technology evolved, and while some developers created new strategies, many continued to build on Weizenbaumâs methods, which had been so successful in engaging and fooling human judges.
Unfortunately, the ability to fool a human is no true gauge of machine intelligence. As Weizenbaum unintentionally demonstrated, human-machine interaction is a two-way street, and it is evident we humans are all too ready to ascribe awareness and even personalities to aspects of our environment that have neither. In their book The Media Equation,5 Stanford professors Clifford Nass and Byron Reeves make the argument that we interact with much of our media as though it was another person. This may even extend to an unconscious tendency to treat other technologiesâour computers, cars, boats, and other toolsâas though they were also alive and self-aware. To quote from their book, âIndividualsâ interactions with computers, television, and new media are fundamentally social and natural, just like interactions in real life.â
If true, this goes a long way toward explaining why, even with the earliest, most basic of chatbots, users have been willing to accept them as social, conversational, and sometimes even intellectual equals. This can even be true for some of those participants who are already aware of the programmed nature of their conversation partner.
Our readiness to accept these programs as conversational partners is a major reason I believe the chatbot and the virtual personal assistant are destined to become essential user interfaces in the near future. As technology has advanced, particularly in the field of computing over the last seventy years, the need for well-designed interfaces has become desirable, if not essential. As Brenda Laurel wrote in her 1990 book, The Art of Human-Computer Interface Design,6 âWe naturally visualize an interface as the place where contact between two entities occurs. The less alike those two entities are, the more obvious the need for a well-designed interface becomes.â In so many respects, our technology is becoming increasingly complex and is in need of a universal, intuitive interface that allows us to easily and rapidly interact with it, whether we are veteran users or engaging the technology for the very first time.
Since the beginning of the computer age, we have sought out increasingly powerful, flexible, and natural user interfaces. Beginning from hardwired programs, we moved to punch cards and punch tape before this gave way to the command line interface, which allowed users to type commands directly into a system. Graphic user interfaces followed, using mice and monitors that could provide a What-You-See-Is-What-You-Getâor WYSIWYG (pronounced wiz-e-wig)âexperience. As computers became even more powerful, there was enough spare processing power to perform the work needed to implement a range of natural user interfaces, or NUIs. This progression clearly demonstrates a trend toward ever more natural means of interacting with our increasingly complex technology. Today, NUIs that allow us to use gesture, touch, and voice commands are becoming more and more common as a means of controlling our devices. The overall trend has democratized the use of computers, taking them from devices once only computer scientists could operate, to being accessible to enthusiasts, to today, when they can be used by children and even toddlers. As we enter the third decade of the twenty-first century, language-enabled virtual assistants are being developed to ...