Spoken Language Understanding
eBook - ePub

Spoken Language Understanding

Systems for Extracting Semantic Information from Speech

Gokhan Tur, Renato De Mori

  1. English
  2. ePUB (mobile friendly)
  3. Available on iOS & Android
eBook - ePub

Spoken Language Understanding

Systems for Extracting Semantic Information from Speech

Gokhan Tur, Renato De Mori

Book details
Book preview
Table of contents
Citations

About This Book

Spoken language understanding (SLU) is an emerging field in between speech and language processing, investigating human/ machine and human/ human communication by leveraging technologies from signal processing, pattern recognition, machine learning and artificial intelligence. SLU systems are designed to extract the meaning from speech utterances and its applications are vast, from voice search in mobile devices to meeting summarization, attracting interest from both commercial and academic sectors.

Both human/machine and human/human communications can benefit from the application of SLU, using differing tasks and approaches to better understand and utilize such communications. This book covers the state-of-the-art approaches for the most popular SLU tasks with chapters written by well-known researchers in the respective fields. Key features include:

  • Presents a fully integrated view of the two distinct disciplines of speech processing and language processing for SLU tasks.
  • Defines what is possible today for SLU as an enabling technology for enterprise (e.g., customer care centers or company meetings), and consumer (e.g., entertainment, mobile, car, robot, or smart environments) applications and outlines the key research areas.
  • Provides a unique source of distilled information on methods for computer modeling of semantic information in human/machine and human/human conversations.

This book can be successfully used for graduate courses in electronics engineering, computer science or computational linguistics. Moreover, technologists interested in processing spoken communications will find it a useful source of collated information of the topic drawn from the two distinct disciplines of speech processing and language processing under the new area of SLU.

Frequently asked questions

Simply head over to the account section in settings and click on “Cancel Subscription” - it’s as simple as that. After you cancel, your membership will stay active for the remainder of the time you’ve paid for. Learn more here.
At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.
Both plans give you full access to the library and all of Perlego’s features. The only differences are the price and subscription period: With the annual plan you’ll save around 30% compared to 12 months on the monthly plan.
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Yes, you can access Spoken Language Understanding by Gokhan Tur, Renato De Mori in PDF and/or ePUB format, as well as other popular books in Personal Development & Writing & Presentation Skills. We have over one million books available in our catalogue for you to explore.

Information

Publisher
Wiley
Year
2011
ISBN
9781119993940
Chapter 1
Introduction
Gokhan Tur1 and Renato De Mori2
1 Microsoft Speech Labs, Microsoft Research, USA
2 McGill University, Canada and University of Avignon, France
1.1 A Brief History of Spoken Language Understanding
In 1950, Turing published his most cited paper, entitled “Computing Machinery and Intelligence”, trying to answer the question “Can machines think?” (Turing, 1950). Then he proposed the famous imitation game, or the Turing test, which tests whether or not a computer can successfully imitate a human in a conversation. He also prophesied that “at the end of the century the use of words and general educated opinion will have altered so much that one will be able to speak of machines thinking without expecting to be contradicted”. Yet, now we are well past the year 2000, and we wonder whether he meant the end of 21st century when machines will be able to “understand” us.
Spoken language understanding (SLU) is currently an emerging field in the intersection of speech processing, natural language processing (NLP) by leveraging technologies from machine learning (ML) and artificial intelligence (AI). While speech is the most natural medium people use to interact with each other, when using tools, machines, or computers we use many other modalities such as mouse, keyboard, stylus, but not speech. Similarly, when people talk to each other, there is no record and the words are simply lost. However there is strong interest – both commercial and academic – in understanding such communications. As speech recognition and NLP algorithms mature, these areas become far from unreachable dreams. It is clear that we will see an increase in the number of SLU applications in the future. For example, robots will better understand what we say instead of reacting to a finite number of predefined commands, or we will be able to ask the computer to retrieve a phone call with a certain person about a particular topic.
Simply put, language understanding aims to extract “meaning” from natural language. In the literature on social sciences and artificial intelligence, many great names have worked on formalizing what meaning constitutes. Jackendoff, who has worked for many decades on the semantics of natural language and its lexical and syntactic expressions, admits that “meaning” is the “holy grail” for linguistics and philosophy (Jackendoff, 2002).
One favorite quote from Shannon's well-known 1948 paper on information theory says that: The fundamental problem of communication is that of reproducing at one point either exactly or approximately a message selected at another point. Frequently the messages have meaning. . . . These semantic aspects of communication are irrelevant to the engineering problem (Shannon, 1948). It is ironic that information-theory-based approaches now dominate the field of spoken language understanding. Apparently, and interestingly using such data-driven approaches is very well suited for processing spoken language after several decades of over-promised and under-delivered attempts by using only knowledge-based approaches which tried to mimic the human brain for building intelligent agents. As one of the pioneers of statistical spoken language processing, Fred Jelinek, once put it: The human analogy is misguided because machines do things differently than biological beings. Airplanes don't flap their wings.
People no longer believe that machines will be able to converse like humans in the near future. On the other hand, a number of systems aimed at understanding have been developed in the last decade. It is possible to categorize these systems into three, based on their high-level approaches. The first category is the ELIZA like systems, where the system does not really understand what is said, but just mimics understanding. ELIZA was built at MIT in the 1960s (Weizenbaum, 1966) and has been the most popular artificial intelligence program, even though there was no intelligence but instead relied on simple pattern matching. A typical interaction may have such excerpts:
User: I need some help
System: Why do you say you need some help?
User: I am depressed
System: Are you depressed often?
Contemporary clones of ELIZA, such as ALICE,1 are moving towards embedding more sophisticated language processing technologies within the same framework.
The understanding systems in the second category are rooted in artificial intelligence. They are demonstrated to be successful for very limited domains, using deeper semantics. These systems are typically heavily knowledge-based and rely on formal semantic interpretation, defined as mapping sentences into their logical forms. In its simplest form, a logical form is a context-independent representation of a sentence covering its predicates and arguments. For example, if the sentence is “John loves Mary”, the logical form would be “(love john mary)”.
During the 1970s, the first systems for understanding continuous speech were developed with interesting approaches for mapping language features into semantic representations. For this purpose, case grammars were proposed for representing sets of semantic concepts with thematic roles such as agent or instrument. The ICSI FrameNet project, for instance, focused on defining semantic frames for each of the concepts (Lowe and Baker, 1997). For example, in the “commerce” concept, there is a “buyer” and a “seller” and other arguments such as the “cost”, “good”, and so on. Therefore, two sentences “A sold X to B” and “B bought X from A” are semantically parsed as the same. Following these ideas, some researchers worked towards building universal semantic grammars (or interlingua), which assumes that all languages have a shared set of semantic features (Chomsky, 1965). Such interlingua-based approaches have also heavily influenced language translation until late 1990s before statistical approaches began to dominate the field. Allen (1995) may be consulted for more information on the artificial-intelligence-based techniques for language understanding.
The last category of understanding systems is the main focus of this book, where understanding is reduced to a (mostly statistical) language processing problem. This corresponds to attacking targeted speech understanding tasks instead of trying to solve the global machine understanding problem. A good example of targeted understanding is detecting the arguments of an intent given a domain, as in the Air Travel Information System (ATIS) (Price, 1990). ATIS was a popular DARPA-sponsored project, focusing on building an understanding system for the airline domain. In this task, the users utter queries on flight information such as I want to fly to Boston from New York next week. In this case, understanding was reduced to the problem of extracting task specific arguments in a given frame-based semantic representation involving, for example, Destination and Departure Date. While the concept of using semantic frames is motivated by the case frames of the artificial intelligence area, the slots are very specific to the target domain, and finding values of properties from automatically recognized spoken utterances may suffer from automatic speech recognition errors and poor modeling of natural language variability in expressing the same concept. For these reasons, the spoken language understanding researchers employed known classification methods for filling frame slots of the application domain using the provided training data set and performed comparative experiments. These approaches used generative models such as hidden Markov models (Pieraccini et al., 1992), discriminative classification methods (Kuhn and Mori, 1995) and probabilistic context free grammars (Seneff, 1992; Ward and Issar, 1994).
While ATIS project coined the term “spoken language understanding” for human/machine conversations, it is not hard to think of other interactive understanding tasks, such as spoken question answering, voice search, or other similar human/human conversation understanding tasks such as named entity extraction or topic classification. Hence, in this book, we take a liberal view of the term spoken language understanding and attempt to cover such popular tasks which can be considered under this umbrella term. Each of these tasks are studied extensively, and the progress is fascinating.
SLU tasks aim at processing either human/human or human/machine communications. Typically the tasks and the approaches are quite different for each case. Regarding human/machine interactive systems, we start from the heavily studied tasks of determination of intent of its arguments and their interaction with the dialog manager within a spoken dialog system. Recently question answering from speech has become a popular task for human/machine interactive systems. Especially with the proliferation of smart phones, voice search is now an emerging field with ties to both NLP and information retrieval. With respect to human/human communication processing, telephone conversations or multi-party meetings are studied in depth. Recently, the established language processing tasks, such as speech summarization and discourse topic segmentation, have been developed to process human/human spoken conversations. The extraction of specific information from speech conversations to be used for mining speech data and speech analytics is also considered in order to ensure quality of a service or monitor important events in application domains.
With advances in machine learning, speech recognition, and natural language processing, SLU, in the middle of all these fields, has improved dramatically during the last two decades. As the amount of available data (annotated or raw) has grown with the explosion of web sources and other information kinds, another exciting area of research area is coping with spoken information overload. Since SLU is not a single technology, unlike speech recognition, it is hard to present a single application. As mentioned before, any speech processing task eventually requires some sort of spoken language processing. Conventional approaches of plugging in the output of a speech recognizer to the natural language processing engine is not a solution in most cases. The SLU application must be robust to speech, speech recognition errors, certain characteristics of uttered sentences, and so on. For example, most utterances are not grammatical and have disfluencies, and hence off-the-shelf syntactic parsers trained with written text sources, such as newspaper articles, fail frequently.
There is also a strong interest from the commercial world about SLU applications. These typically employ knowledge-based approaches, such as building hand-crafted grammars or using a finite set of commands, and are now used in some environments such as cars, call-centers, and robots. This book also aims to fill this chasm in approaches employed between commercial and academic communities.
The focus of the book will be to cover the state-of-the-art approaches (mostly data-driven) for each of the SLU tasks, with chapters written by well-known researchers in the respecive fields. The book attempts to introduce the reader to the most popular tasks in SLU.
This book is proposed for graduate courses in electronics engineering and/or computer science. However it can also be useful to social science graduates with field expertise such as psycholinguists, linguists, and to other technologists. Experts in text processing will notice how certain language processing tasks (such as summarization or named entity extraction) are handled with speech input. The members of the speech processing community will find surveys of tasks beyond speech and speaker recognition with a comprehensive description of spoken language understanding methods.
1.2 Organization of the Book
This book covers the state-of-the-art approaches to key SLU tasks as listed below. These tasks can be grouped into two categories based on their main intended application area, processing human/human or human/machine conversations, though in some cases this distinction is unclear.
For each of these SLU tasks we provide a motivation for the task, a comprehensive literature survey, the main approaches and the state of the art techniques, and some indicative performance figures in established data sets for that task. For example, when template filling is discussed, ATIS data is used since it is already available for the community.
1.2.1 Part I. Spoken Language Understanding for Human/Machine Interactions
This part of the book covers the established tasks of SLU, namely slot filling and intent determination as used in dialog systems, as well as newer understanding tasks which focus on human/machine interactions such as voice search and spoken question answering. Two final chapters, one on describing SLU in the framework of modern dialog systems, and another discussing active learning methods for SLU conclude Part I.
Chapter 2 History of Knowledge and Processes for Spoken Language Understanding
This chapter reviews the evolution of methods for spoken language understanding systems. Automatic systems for spoken language understanding using these methods are then reviewed, building the stage for the rest of Part I.
Chapter 3 Semantic Frame Based Spoken Language Understanding
This chapter provides a comprehensive coverage of semantic frame-based spoken language understanding approaches as used in human/computer interaction. Being the most extensively studied SLU task, we try to distill the established approaches and recent literature to provide the reader with a comparative and comprehensive view of the state of the art in this area.
Chapter 4 Intent Determination and Spoken Utterance Classification
This chapter focuses on the complementary task of semantic template filling tasks, i.e. spoken utterance classification techniques and ill...

Table of contents

  1. Cover
  2. Title Page
  3. Copyright
  4. Dedication
  5. List of Contributors
  6. Foreword
  7. Preface
  8. Chapter 1: Introduction
  9. Part 1: Spoken Language Understanding for Human/Machine Interactions
  10. Part 2: Spoken Language Understanding for Human/Human Conversations
  11. Index