eBook - ePub

Anaphora Resolution and Text Retrieval

Name: Anaphora Resolution and Text Retrieval
Author: Helene Schmolz

Helene Schmolz,

318 pages
English
ePUB (mobile friendly)
Available on iOS & Android

eBook - ePub

Anaphora Resolution and Text Retrieval

Helene Schmolz,

Book details

Book preview

Table of contents

Citations

About This Book

Die Reihe trägt der Tatsache Rechnung, dass sich die empirische, auf qualitative oder quantitative Korpusanalyse gestützte Beschreibung von geschriebener und gesprochener Sprache mittlerweile als zentrales Paradigma innerhalb der Sprachwissenschaft etabliert hat. Eine gebrauchsbasierte Orientierung ist dabei auf allen Ebenen der Sprachbeschreibung zu beobachten, sie reicht von Ansätzen in der Phonologie- bzw. Prosodieforschung über empirische Arbeiten in der Morphologie, Syntax und Semantik bis hin zu pragmatischen Ansätzen wie beispielsweise der Diskurs-, Text- und Gesprächsanalyse sowie der Medien- und Soziolinguistik. Das Ziel der Reihe ist, eine thematisch offene Plattform für unterschiedliche Ansätze innerhalb der synchron orientierten Sprachwissenschaft sowie für interdisziplinäre Arbeiten mit einem sprachwissenschaftlichen Schwerpunkt bereitzustellen, die innovative Wege empirischen Arbeitens aufzeigen und neue Methoden und theoretische Modelle anhand von Datenmaterial entwickeln.
Publiziert werden Monographien sowie Sammelbände mit einem synchronen, datenbasierten Zugang zu Sprachanalysen. Die Publikationssprache ist entweder deutsch oder englisch. Alle Beiträge werden peer-reviewed.

Open Access:
Dank eines Pilotprojekts mit dem FID Linguistik werden zwischen 2019 und 2021 sechs Neuerscheinungen Open Access publiziert. Zudem wurden die bereits erschienenen E-Books der Bände 1 bis 9 nachträglich in Open Access-Publikationen umgewandelt.
https://www.linguistik.de/

Externe GutachterInnen:
Magnus P. Ängsal (Göteborg),
Michael Beißwenger (Duisburg-Essen),
Pia Bergmann (Jena),
Noah Bubenhofer (Dresden),
Helen Christen (Fribourg),
Waldemar Czachur (Warschau),
Ulla Fix (Leipzig),
Karina Frick (Zürich),
Stephan Habscheid (Siegen),
Jörg Hagemann (Freiburg),
Mathilde Hennig (Gießen),
Katharina König (Münster),
Alfred Lameli (Marburg),
Jens Lanwer (Münster),
Konstanze Marx (Mannheim),
Marcus Müller (Darmstadt),
Thomas Niehr (Aachen),
Martin Pfeiffer (Freiburg),
Hannes Scheutz (Innsbruck),
Anja Stukenbrock (Lausanne),
Georg Weidacher (Graz),
Evelyn Ziegler (Duisburg-Essen),
Alexander Ziem (Düsseldorf).

Frequently asked questions

Simply head over to the account section in settings and click on “Cancel Subscription” - it’s as simple as that. After you cancel, your membership will stay active for the remainder of the time you’ve paid for. Learn more here.

At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.

Both plans give you full access to the library and all of Perlego’s features. The only differences are the price and subscription period: With the annual plan you’ll save around 30% compared to 12 months on the monthly plan.

We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.

Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.

Yes, you can access Anaphora Resolution and Text Retrieval by Helene Schmolz in PDF and/or ePUB format, as well as other popular books in Languages & Linguistics & Syntax in Linguistics. We have over one million books available in our catalogue for you to explore.

Information

Publisher

De Gruyter

Year

2015

ISBN

9783110416817

Edition

Topic

Languages & Linguistics

Subtopic

Syntax in Linguistics

Index

Languages & Linguistics

1 Introduction

Searching for information on the World Wide Web is a method of acquiring information useful to many people in varied guises. As the Internet continues to grow rapidly, with evermore resources such as hypertexts, documents or multimedia files being added every second, the challenge of providing users with the specific contents they need becomes more and more important. Current search engines are far from perfect as they hardly ever return completely satisfying results. One reason for this is that search engines, or more specifically text retrieval systems, usually do not consider the semantics of a text but rather just conduct a statistical analysis. Search engines, for example, will tend to rank a text in which university occurs ten times more highly than a text containing four occurrences of that item. This approach, however, cannot adequately represent the text’s content, nor is a simple “bag-of-words” approach, where each text is merely seen as containing “an unordered set of words” (Baeza-Yates & Castillo 2006: 527), sufficient (cf. Jurafsky & Martin 2009: 801). Accordingly, the results returned by Web retrieval systems are commonly not subjected to any closer examination of the text’s topics, let alone a linguistic analysis. Thus, search systems tend to use purely quantitative, rather than qualitative linguistic methods.

One approach to analyse a text linguistically is to investigate its cohesion, and here, more specifically, to pay attention to anaphors. The aim of this book is to outline anaphors of English and to examine to what extent they are worth being considered in text retrieval systems. Although anaphors and their resolution is a highly debated issue in present research, there are few studies that explore anaphors in the context of text retrieval. Even research in the field of anaphora resolution that is not intended for text retrieval shows a number of deficiencies.

To start with, a comprehensive classification of anaphor types based on linguistic description and also with regard to text retrieval systems is missing. Text retrieval systems would profit from a thorough examination because more precise rules for resolving anaphors could be formulated. The standard work for computational anaphora resolution is Mitkov’s book Anaphora Resolution (2002). However, Mitkov’s classification is not satisfying from a linguistic point of view because it does not take into account the many different types of anaphors and their features.

A further weakness is that in the discussion of anaphor types, no approach pays attention to non-finite clause anaphors. Not only are non-finite clauses disregarded as one type of anaphor in existing text retrieval systems, but they are even frequently ignored as one type of anaphor in linguistics, e.g. in Stirling & Huddleston (2010). Although Quirk et al. (2012: 910) mention non-finite clauses as a special type of ellipsis, they do not discuss whether non-finite clauses are a type of anaphor or not.

An additional shortcoming lies in the scarcity of annotated corpora. The few corpora that are annotated are mostly not freely accessible. Furthermore, current annotations of corpora only contain a few anaphor types, which is why these corpora are not adequate for an examination in this book.

All in all, there are many desiderata in the field of anaphora resolution. This book will contribute to their investigation from a linguistic and computational point of view. It draws particularly on syntactic, text linguistic and corpus linguistic methods as well as on methods from text retrieval and natural language processing. This book will first examine the question of what anaphor types the English language shows. For that purpose, a linguistic definition of anaphors is needed (cf. chapter 2), before a classification of anaphor types is presented (cf. chapter 3). Second, the frequency of each type of anaphor in hypertexts will be analysed (cf. chapter 4). From these insights, research questions for computational anaphora resolution can then be formulated (cf. chapter 4.5).

In more detail, the book is structured as follows. The second chapter will define anaphors and discuss related concepts. It will conclude with six conditions or characteristics of anaphors, all of which have to apply to items in order to be regarded as anaphors. In the third chapter, the twelve types of anaphors will be described in detail. The grammatical features of each anaphor type will be explained in depth, which is subsequently also of importance for computational anaphora resolution. The fourth chapter will examine the frequency of anaphors in hypertexts. Here, a corpus including different types of hypertexts will be introduced and statistically investigated with regard to anaphor types. A further chapter will describe text retrieval systems in general and for retrieving hypertexts from the Internet specifically, and the types of natural language processing methods these systems use. The sixth chapter will then present computational anaphora resolution, i.e. current approaches and applications, and the structure and evaluation of anaphora resolution systems. In the last chapter, non-finite clause anaphors will be analysed with respect to computational anaphora resolution, applying the insights of chapter four about the frequency of anaphors. Rules for identifying anaphors as well as for assigning antecedents will be established. Finally, the results will be discussed and perspectives for future research will be offered.

2 Linguistic fundamentals of anaphors and anaphora

2.1 Basic definitions

The word anaphora originates from Greek ana- (“back”) and pherein (“to bear”) and entered English via Latin transmission (cf. “Anaphora” 2010). In English, it is documented for the first time in 1589 (cf. Simpson & Weiner 1989: 436-437):

Anaphora, or the Figure of Report. Repetition in the firſt degree we call the figure of Report according to the Greeke originall, and is when we make one word begin, and as they are wont to ſay, lead the daunce to many verſes in ſute, as thus.

To thinke on death it is a miferie,

To think on life it is a vanitie:

To thinke on the world verily it is,

To thinke that heare man hath no perfit bliffe.

(Puttenham 1589: 165)

“Anaphora” here denotes the rhetoric figure of repetition. The first written evidence of a use in grammar is not found until 1933, when the term appeared in Bloomfield’s work Language:

[W]hen we say Ask that policeman, and he will tell you, the substitute he means, among other things, that the singular male substantive expression which is replaced by he, has been recently uttered. A substitute which implies this, is an anaphoric or dependent substitute, and the recently-uttered replaced form is the antecedent. (Bloomfield 1984: 249)

Later he gives another example:

The word one […] replaces a with anaphora of the noun […] when no other modifier is present (Here are some apples; take one); […] it is the anaphoric substitute for nouns after an adjective, and in this use forms a plural, ones (the big box and the small one, these boxes and the ones in the kitchen […]). (ibid.: 265-266)

As for derivations, the adjective anaphoric and the adverb anaphorically are first mentioned in 1914 (cf. Bloomfield 1984: 249-251; Simpson & Weiner 1989: 436-437). According to the Oxford Dictionary of English (2005, 2^nd rev. ed.), the noun anaphor has its origin in a backformation of anaphora, which dates back to the 1970s (cf. Soanes & Stevenson 2005: 55).

When consulting current dictionaries, the word anaphora often divides up into different senses, depending on its use in various contexts. First, the term denotes a part of the mass in liturgics. Second, “anaphora” describes the “repetition of a word or phrase at the beginning of successive clauses, lines of verse, etc.” (Agnes et al. 2007: 51) in rhetoric (cf. Wilpert 2001: 27). So it is still used in the sense it was for the first time (cf. Puttenham 1589, above). Third, “anaphora” is used in music for the repetition of a voice, usually the bass (cf. Bartel 2007: 90-95).

The fourth – grammatical – definition is of importance here: “anaphora” is “[t]he use of a word which refers to, or is a substitute for, a preceding word or group of words” (Simpson & Weiner 1989: 436). According to Valentin (1996: 179), this meaning has developed from the use of anaphora in rhetoric. The following example illustrates what an anaphor is in the grammatical sense of the word:

(1) Susan plays the piano. She likes music.¹

In example (1), the word she is an anaphor² and refers back to a preceding expression, in this case Susan. As can be seen in this example, an anaphor is an item that commonly points backwards.³ Anaphors derive their interpretation from the expressions they refer to because their own meaning is often rather general (cf. Finch 2005: 199-200; Trask & Stockwell 2007: 16-17; Huddleston 2010a: 68; Quirk et al. 2012: 335, 862). This becomes obvious if the second sentence in example (1), She likes music, appears on its own. In such circumstances, it is not possible to find out the person meant by she. We can only state that it is most likely a female person.⁴ But if both sentences are present, she is undoubtedly used in place of Susan.

The linguistic element or elements to which an anaphor refers is called “an antecedent”. The antecedent in the preceding example is the expression Susan. The relationship between anaphor and antecedent is termed “anaphora” (cf. Huddleston 2010a: 68-69). “Anaphora resolution” or “anaphor resolution” is the process of finding the correct antecedent of an anaphor (cf. Kübler n.d.: 5; Mitkov 2004a: 269; Crystal 2009: 25). In addition, so-called “anaphoric chains” can arise, if anaphors are themselves antecedents. In example (2), the anaphor she refers to the antecedent Ann, and she is also the antecedent of herself (cf. Halliday & Hasan 2008: 15, 52; Stirling & Huddleston 2010: 1457).

(2) Ann knew that she had written the letter herself.

Another central aspect of anaphors is that they can vary with regard to the importance of the antecedent for determining reference. Anaphoric noun phrases with a definite article are a case in point. They, for instance, can have antecedents that are not needed for determining the referent of the anaphor, as is shown here:

(3) I went to an amusing show recently where I met two friends.… As they were sitting next to me during the show [1] I was able to ask them about the presenter. However, they could not tell me anything about the show [2].

In this example, the second anaphor the show [2] has the antecedent the show [1]. At first sight, the second anaphor [2] does not seem to gain new information through this relation to the antecedent [1]. But as the antecedent [1] itself is an anaphor and refers to an amusing show, the second anaphor [2] also gains information through these links. In consequence, it makes sense that the second anaphor [2] is interpreted in relation to its identical antecedent [1] (cf. Quirk et al. 2012: 1464-1465).

Recognising anaphors whose antecedents are literally identical with themselves is also important for computational anaphora resolution systems because anaphoric chains can be established through that process. Additionally, when detecting anaphoric chains, the distance between anaphor and antecedent does not become unnaturally large. Stirling & Huddleston (2010) argue:

There can be a very large distance between the first antecedent in a chain and the final anaphor, greater than would typically be permitted for a direct link: it is the intermediate links that keep the referent salient in the context of discourse so that reference to it can be made by means of a personal pronoun or other anaphor with little intrinsic content. (ibid.: 1457)

With regard to ...

Empirische Linguistik/ Empirical Linguistics
Title Page
Copyright Page
Foreword
Table of Contents
1 Introduction
2 Linguistic fundamentals of anaphors and anaphora
3 Types of anaphors
4 Anaphors in hypertexts
5 Text retrieval and its handling of anaphors
6 Approaches to and uses of anaphora resolution
7 Development of extensive linguistic rules for anaphora resolution: the example of non-finite clause anaphors
8 Conclusion
Bibliography
Index