eBook - ePub

An Introduction to Natural Language Processing Through Prolog

Name: An Introduction to Natural Language Processing Through Prolog
Author: Clive Matthews

Clive Matthews,

318 pages
English
ePUB (mobile friendly)
Available on iOS & Android

eBook - ePub

An Introduction to Natural Language Processing Through Prolog

Clive Matthews,

Book details

Book preview

Table of contents

Citations

About This Book

Research into Natural Language Processing - the use of computers to process language - has developed over the last couple of decades into one of the most vigorous and interesting areas of current work on language and communication. This book introduces the subject through the discussion and development of various computer programs which illustrate some of the basic concepts and techniques in the field. The programming language used is Prolog, which is especially well-suited for Natural Language Processing and those with little or no background in computing.Following the general introduction, the first section of the book presents Prolog, and the following chapters illustrate how various Natural Language Processing programs may be written using this programming language. Since it is assumed that the reader has no previous experience in programming, great care is taken to provide a simple yet comprehensive introduction to Prolog. Due to the 'user friendly' nature of Prolog, simple yet effective programs may be written from an early stage. The reader is gradually introduced to various techniques for syntactic processing, ranging from Finite State Network recognisors to Chart parsers. An integral element of the book is the comprehensive set of exercises included in each chapter as a means of cementing the reader's understanding of each topic. Suggested answers are also provided.An Introduction to Natural Language Processing Through Prolog is an excellent introduction to the subject for students of linguistics and computer science, and will be especially useful for those with no background in the subject.

Frequently asked questions

Simply head over to the account section in settings and click on “Cancel Subscription” - it’s as simple as that. After you cancel, your membership will stay active for the remainder of the time you’ve paid for. Learn more here.

At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.

Both plans give you full access to the library and all of Perlego’s features. The only differences are the price and subscription period: With the annual plan you’ll save around 30% compared to 12 months on the monthly plan.

We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.

Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.

Yes, you can access An Introduction to Natural Language Processing Through Prolog by Clive Matthews in PDF and/or ePUB format, as well as other popular books in Langues et linguistique & Linguistique. We have over one million books available in our catalogue for you to explore.

Information

Publisher

Routledge

Year

2016

ISBN

9781317898337

Edition

Topic

Langues et linguistique

Subtopic

Linguistique

PART I

Introduction

CHAPTER 1

Natural Language Processing

In almost every aspect of our daily experience, we are touched by the computer revolution. Computers control our cars, heating systems and washing machines; they are crucially involved in the production of our newspapers and weather forecasts, our bank statements and supermarket bills, the special effects on our television screens and the music from our sound systems; and concepts derived from computing subtly shape and organise our perceptions of the world. However, pervasive as computers already are, their potential is so great that they are destined to play an ever more integral role in our lives.

If society is to enjoy the full benefits of these advances, all of its members will need equal access to the technology. This is not always the case at present. Today’s computer user has to be an expert of sorts, becoming ‘computer literate’, in order fully to exploit current applications. The problem is that many potential users have neither the time, inclination nor aptitude necessary to acquire this knowledge. For them, the computer is a forbidding object, best left in the hands of specialists, and only to be experienced indirectly. A large part of the next stage of the computer revolution will be involved in overcoming this literacy barrier.

It is most unlikely that the solution to this problem will be a simple increase in educational resources since the results will be highly variable; if computer literacy is partially a matter of aptitude, there will always be some who will remain challenged by the technology. A far more radical alternative is to turn the problem on its head and design computers that are ‘human literate’. In this way, we will be able to interact with them on our terms rather than theirs so that users will require no special training to operate them. Since a large part of human communication is conducted effortlessly through natural languages such as English, Japanese or Swahili, the ability of computers also to be able to converse in such languages will be one of the crucial components in making them ‘human literate’. The capacity of a computer to ‘understand’ a natural language is referred to as Natural Language Processing (NLP). This book is an introduction to some of the elementary programming techniques involved in designing and building a computer with an NLP capability.

The rest of this chapter explores some of the potential applications of NLP technology. As a broad division, it is useful to think of two types of application; those that will facilitate the flow of information between man and machine (section 1.1) and those where the aim is to improve communication between man and man (section 1.2). As section 1.3 outlines, NLP can also benefit the study of linguistics irrespective of its practical applications.

1.1 Natural Language Interfaces

Most computer applications are intended as aids to human activities rather than as free-standing agents. For this reason it is important that the interface between human and computer be user-friendly, in the sense of being easy to learn, use and understand. The history of computing is, in part, a story of increasingly friendly interfaces. For some applications, the logical extension of this process is the provision of natural language interfaces.

Natural language interfaces are not necessarily the best solution for all applications. For example, graphical interfaces linked to a pointing device such as a mouse are an efficient and user-friendly method in many cases. However, where the input to the system is textual, a natural language interface has various advantages as some of the examples in this section show.

A database management system enables the information in a database to be altered, sorted, and retrieved at the command of the user. There are innumerable questions and requests that a user might want to issue to such a system. Here is a sample.

Which cities’ sales were greater than we targeted for?

Produce a graph of last year’s sales for Cambridge by week.

Change this year’s target for Norwich from £12,250 to £13,000.

When do our earliest records for Seiler PLC date from?

Display all those under target for the last two years in bold print throughout.

In the early days of computing, commands such as these could only be expressed through a machine language, the ‘native’ language of the computer. The following gives an idea of what such instructions look like.

0100101100000100

1010000111010100

0000000110010011

0011110011111001

Each line is a direction to perform one of the machine’s basic operations – compare, copy, add, multiply and so on. They are expressed in binary code – combinations of 0s and 1s. These basic operations are so primitive that even to get the machine to perform a simple task, such as adding two numbers together, requires several lines of code. Since every detail of how to carry out more intricate commands must be fully specified, machine language is far too cumbersome a language in which to express the types of instruction required of a database management system.

Matters are not much improved with assembler languages. These use mnemonic expressions for the instructions. For instance, the previous lines of machine code written in assembler might look as follows.

LOAD 3, A

LOAD 4, B

ADD 3, 4

LOAD 3, C

Assembler statements must be translated into machine language before they can be executed by the computer. This is performed automatically by a program called a compiler.

Assembler languages are closely allied to machine code and are only marginally more friendly. High-level programming languages such as FORTRAN, COBOL or BASIC are an improvement. They allow the replacement of chunks of assembler/machine code with single expressions. For instance, the previous example might now be reduced to:

ADD A, B GIVING C

Again, this instruction has to be compiled into machine code before it can be executed by the machine although this is hidden from the user.

Programming languages allow maximal control over the computer. The price paid is the high degree of expertise required to exploit them. User-friendliness, on the other hand, tends to be achieved at the expense of control and flexibility. Consider the use of pull-down menus. These allow the user to choose from a finite number of pre-selected commands using some kind of pointing device. The technique can even be adapted to produce natural language input, built up from words and phrases selected from a series of menus. The following illustrates such an interface where the user has constructed the sentence What were the stock prices for the Channel Tunnel for each quarter in 1995? from the various options available.

WHAT
is the current quote for is the option price for *WERE THE STOCK PRICES FOR* are the estimated earnings for are the headings for	IBM *CHANNEL TUNNEL* Coca-Cola Co. Euro-Disney
on the London exchange on the American exchange on the New York exchange on the Tokyo exchange	for each month in *FOR EACH QUARTER IN* for the last 12 days for the last month	1993 1994 *1995* 1996

Menu-based interfaces are easy to learn and use, efficient and robust. However, a menu-based format only works well if there are a fairly limited number of choices. They also restrict the range of expression; for example, the question What were the stock prices for Channel Tunnel for each quarter in 1991? can only be phrased this way in the illustrated menu system. For many applications, such restrictions are unproblematic and a menu-based interface is an effective choice. With large databases, however, the constraints are probably too limiting; for example, imagine a database containing financial information on five hundred companies and how many screens this would take to display.

The only alternative to these problems is to try and make the interface language easier to use. Database query languages are the result. The following shows the equivalent of ‘Which cities’ sales were greater than we targeted for?’ in such a language.

SELECT city, sales, target FROM gb WHERE sales > target

gb is the name of the file containing the relevant data and city, sales and target the names of the fields of information in each record. Assuming that the user knows how the database is structured, it is relatively easy to ask simple questions with a query language. However, such languages are unforgiving of mistakes; using town instead of city, for instance, would fail to produce the required information. Further, more complex questions soon start to become quite opaque; the following is equivalent to ‘Which customers have greater than average balances?’

SELECT name, balance FROM customers

WHERE balance >

(SELECT avg (balance) FROM customers)

The progression from machine to database query languages is one of ever greater approximation to English. It would seem that the natural conclusion to this process is the use of English itself as the interface language. This way flexibility would be married to user-friendliness and efficiency. Flexibility because there is no constraint on what can be expressed in English; user-friendliness because English is our natural means of communication in most situations; and efficiency because it is always possible to express any instruction more concisely in English compared with any programming language. This is an impressive list of advantages which makes a natural language interface an attractive proposition.

This, however, is not the end of the story. A natural language front-end, by itself, is unlikely to make a database management system any more accessible to the non-specialist user. The problem is that information retrieval requires a certain degree of expertise in order to be effective. One needs to know, for instance, the most relevant data for a particular purpose, whether any other data should be taken into account, how it might be best presented and so on. In other words, it is not much use being able to ask a question in English if you do not know which question you should be asking in the first place. A solution to this is for the natural language front-end to interface with an expert system, a program able to provide the necessary guidance and through which the database will be indirectly accessed.

Expert systems need not be confined to advice on information retrieval. Examples have been built in various fields including, medical diagnosis, the interpretation of military intelligence reports and the design of experiments in molecular genetics. They are attractive for a number of reasons; allowing access to expertise at times and in places where a human expert would be unavailable or too expensive; freeing-up the human expert for more taxing problems; improving the consistency of decision making and so on.

The ideal form of any interface depends not only on the nature of the application but also on how often the system is accessed by its typical user. The same applies to expert systems; interfaces engaged by experts on a regular basis do not need to be as user-friendly as those used on a more casual basis by a broader-based clientele. There are numerous potential extensions to the technology of this latter type; from systems selling insurance and providing financial advice to computerised travel agents. It is in these cases that a natural language interface will be the optimal solution for reasons already mentioned.

The development of expert systems has led to an interesting extension which will also benefi...

Cover Page
Half Title Page
Title Page
Copyright Page
Contents
Preface
Part I Introduction
Part II The Fundamentals of Prolog Programming
Part III Natural Language Processing with Prolog
Solutions to Exercises
Glossary of Terms
Bibliography
Index