Languages & Linguistics

Parsing

Parsing refers to the process of analyzing a string of symbols according to the rules of a formal grammar. In linguistics, parsing involves breaking down a sentence into its constituent parts to understand its structure and meaning. This process is essential for natural language processing, syntax analysis, and understanding the grammatical structure of languages.

Written by Perlego with AI-assistance

7 Key excerpts on "Parsing"

eBook - ePub
Persian Computational Linguistics and NLP
- Katarzyna Marszałek-Kowalewska, Katarzyna Marsza?ek-Kowalewska(Authors)
- 2023(Publication Date)
- De Gruyter Mouton
  (Publisher)
5 Syntactic Parsing of Persian: From Theory to Practice

Masood Ghayoomi

Abstract

Syntactic analysis of a sentence is the preliminary knowledge required to understand a natural language. More precisely, it is a fundamental step towards deeper language understanding through semantic and discourse processing. Two components play the role in syntactic analysis: (1) Part-Of-Speech (POS) tagging to determine the syntactic role of the words in the local context, and (2) Parsing to provide the tree structure of a sentence by determining the relations between the words of a sentence and labeling the relations. To have a reliable result, this syntactic analysis should be provided under the shadow of a grammar formalism. The major approaches used in the Parsing algorithms are rule-based and statistical. In this chapter, we mainly focus on POS tagging and Parsing in general, and the researches performed on syntactic analyses of Persian sentences in specific.

5.1 Introduction

Computational linguistics is an interdisciplinary field of study that tries to understand and generate human language in computers. To understand human language, the linguistic phenomena should be learned by a computer. To reach this goal, linguistic phenomena have to be defined algorithmically in the computer. To understand a language deeply, three components out of five major linguistic components, i. e. syntax, semantics, and discourse, should be analyzed automatically. Among them, syntactic analysis plays a major role as a bridge towards the deep linguistic analyses, i. e. semantic and discourse analyses (Allen 1994 ).

To define human natural language in a computer, it has to be formalized to be able to model the natural language. Chomsky (1957 ) described the syntactic property of a language formally. Based on Chomsky (1957
Sign up to read
Learn more about book
eBook - ePub
Psychology of Language (PLE: Psycholinguistics)
An Introduction to Sentence and Discourse Processes
- Murray Singer(Author)
- 2013(Publication Date)
- Psychology Press
  (Publisher)
3 Syntax and Parsing Processes
In order to understand a sentence, it is necessary to analyze it into its grammatical elements, called constituents. This analysis is known as Parsing. Accomplishing this task requires knowledge of the grammar of one’s native language. The fact that people are routinely in possession of this knowledge is reflected by their ability to discriminate between grammatical and ungrammatical sentences. In this regard, there is no doubt that (1a) is an acceptable English sentence, but (1b) is not.

(1) a. Wild beasts frighten little children.

b. *Beasts children frighten wild little.

Likewise, (2a) and (2b) are acceptable ways of expressing Jane’s thoughts about the crop, but (2c) is not.

(2) a. Jane thought the crop was not healthy.

b. Jane did not think the crop was healthy.

c. *Jane thought not the crop was healthy.

People’s grammatical skills also permit them to recognize that the sentence, Colorless green ideas sleep furiously, complies with the rules of English, even though it is anomalous (Chomsky, 1957).

The ability to judge the grammaticality of a sentence reflects people’s knowledge of the ideal form of their language, knowledge that is referred to as linguistic competence (Chomsky, 1957). Linguistic competence does not enable people to articulate the grammatical rules of their native tongue. Only instruction in linguistics provides this capability. Therefore, one’s intuitive linguistic knowledge is procedural in nature (see chapter 1 ).

People’s utterances, their language performance (Chomsky, 1957), frequently deviate from the ideal grammatical form. Consider the following fragment from the Watergate transcripts:

President Nixon: Let me say with regard to Colson—and you can say that I’m way ahead of them on that—I’ve got the message on that and that he feels that Dean—but believe me I’ve been thinking about that all day yesterday—whether Dean should be given immunity. (Watergate Transcripts
Sign up to read
Learn more about book
eBook - ePub
Psycholinguistics (PLE: Psycholinguistics)
Central Topics
- Alan Garnham(Author)
- 2013(Publication Date)
- Psychology Press
  (Publisher)
4 Parsing – the computation of syntactic structure
When the words in a sentence have been recognized and their syntactic categories retrieved from the mental lexicon, the language understanding system must compute the structural relations between those words, so that it can go on to determine the message that the sentence conveys. It should be stressed once again, however, that this sketch of the temporal relations between these processes is not meant to suggest that they operate in a strictly serial order. This point will be made clearer in chapter 8 , where the question of interaction between subprocessors will be discussed in detail.

Before describing how structural information is computed it is necessary to consider when and why it is needed in comprehension. It has been suggested (e.g. Schank, 1972; Small and Rieger, 1982) that syntactic analysis can be bypassed, and sentence meaning derived directly from word meanings. In fact, it is not in general possible to forgo a structural analysis, though it may sometimes be. This point can be illustrated by considering one of the methods suggested for eliminating a full parse – the use of case-frames. In a number of theories of word meaning (for example, Schank’s) every verb has one or more case-frames linked to it, which specify (i) the case-roles, such as AGENT, PATIENT and INSTRUMENT, that are associated with the action it denotes, and (ii) which of these case-roles are obligatory and which optional in sentences containing the verb. As an illustration of how a case-frame analysis is supposed to eliminate the need for a complete parse, consider the sentence:

The boy watered the flower.

The analysis proceeds as follows. First watered is identified as the main verb of the sentence. Its associated case-frame is then retrieved from semantic memory. This case-frame states that the verb requires two case fillers, an AGENT and a PATIENT, and it also imposes selectional restrictions on those case fillers, which delimit the kind of things that can take the case roles. For example, the AGENT that does the watering must be either animate or a natural agency such as rain. From the semantic information in the lexical entries for boy and flower the referents of the noun phrases in which these nouns occur can be assigned to the appropriate case roles, since, of the two noun phrases, only the boy
Sign up to read
Learn more about book
eBook - ePub
Cognition
From Memory to Creativity
- Robert W. Weisberg, Lauretta M. Reeves(Authors)
- 2013(Publication Date)
- Wiley
  (Publisher)
In Chapter 9 we contrasted the rule-based theory of word learning with an emergent view, which proposed that word learning came about through the child's use of general—rather than language-specific—learning mechanisms. A similar difference of opinion has arisen concerning the mechanisms involved in sentence processing. In contrast to the Chomskian perspective, the emergent view of language processing proposes that grammar is simply a specific function of the cognitive system (Croft & Cruse, 2004). Syntax is not based on content-free rules, nor is it a separate subcomponent of language. People use their conceptual knowledge about word meanings to separate words into nouns, verbs, and so on. They then use powerful general learning mechanisms to determine statistical regularities in the sentences they hear around them. Those regularities, rather than rules, are used to construct new, and increasingly complex, sentences. We examine evidence relevant to those two views of language later in the chapter. First we turn to an examination of research concerning language processing.

Processing Sentences: Determining Syntactic Structure of a Sentence

When a fluent speaker of a language hears a sentence in a conversation, the first step in interpretation involves Parsing the sentence into its phrase structure. There are two potential Parsing strategies. The first assumes that the listener tries to hold the words in working memory (WM) until the whole sentence has been heard, and then determine the s-structure. This strategy would put a large burden on WM, especially for long sentences. The other possible sentence-processing strategy is the immediacy of interpretation method (Whitney, 1998), in which a sentence is interpreted immediately, as each word or phrase comes in. In this way, one might be able to parse a sentence while putting a limited burden on working memory, because very few words are held in WM at any given time. The listener continually adds to the syntactic structure he or she has already developed, and words already processed can then be dropped from WM. For example, if you hear Jim and Mary…, you assume that the sentence begins with an NP, so you can turn your attention to the next set of words in the sentence. In addition, one can use one's knowledge about typical sentence structure in a top-down fashion, and assume that a verb phrase will follow (e.g., Jim and Mary left early.), which may also facilitate processing. We often use expectations about the most likely way a sentence will work out to begin interpretation. Based on this view, sentence processing can be seen as a further example of the role of expectancies and top-down processing in cognitive processing.

Evidence that people process sentences using immediacy of interpretation comes from the study of garden-path
Sign up to read
Learn more about book
eBook - ePub
Sentence Processing
- Roger P. G. van Gompel(Author)
- 2013(Publication Date)
- Psychology Press
  (Publisher)
Although bilingual sentence processing has only recently become a topic of interest, researchers are beginning to build a picture of how bilinguals comprehend the L2 input in real time in comparison to native speakers. However fundamental these differences may be, it is clear that it is grammatical processing that differs between bilingual and native speaker Parsing. The above overview indicates that different processing modes may be in operation, and that if bilinguals are motivated to pay attention to the morpho-syntactic details of the input and given high enough proficiency and/or cognitive capacity, they can perform deep on-line analyses. Otherwise, their processing is shallower than that of native speakers.

One of the many questions that remain concerns the role of Parsing in the learning process. Some work has looked at how sensitive lower-proficiency learners are to morpho-syntactic information (e.g., Keating, 2009; Schimke, 2009; Tokowicz & MacWhinney, 2005), but there is very little work on how Parsing the input helps to develop linguistic knowledge in either L1 or L2 (see Dekydtspotter, Kim, et al., 2008 and Fodor, 1998, for discussion; also Osterhout & col leagues, e.g., McLaughlin, Tanner, Pitkänen, Frenck-Mestre, Inoue, K., Valen tine, & Osterhout, 2010, on EEG findings of the grammaticalisation process in L2 acquisition), yet this is an important topic, given that learners at some point must parse the input with limited grammatical knowledge. It is hoped that such future work – with that on the effects individual difference factors such as cognitive capacity and proficiency/exposure – will shed light on the human sentence-processing mechanism in general, and thus will be of great interest to researchers in the fields of both language acquisition and sentence processing.
Notes
1 Most of the studies reported in this overview focus on ‘late’ L2 learners, that is, those acquiring the L2 after puberty, rather than ‘simultaneous’ bilinguals who acquire two languages from birth. The terms ‘bilingual’ and ‘second language learner’ are used interchangeably throughout, and where the participants differ from this post-childhood learner group, it is noted in the text.
2
Sign up to read
Learn more about book
eBook - ePub
Formalizing Natural Languages
The NooJ Approach
- Max Silberztein(Author)
- 2016(Publication Date)
- Wiley-ISTE
  (Publisher)
7.13 . Syntax trees are designed to represent sentence structure, whereas parse trees represent the structure of the grammar visited during Parsing.

Remember that parse trees represent derivation chains that are constructed when Parsing a sentence and exploring its grammar, and are therefore particularly useful for “debugging” a grammar. This is why many syntactic parsers used in NLP offer these parse trees as results of syntactic analysis; by doing this, these pieces of software equate sentence structure and grammar structure.

The problem then is that each time the grammar is modified, the resulting parse tree given for the same sentence is also modified. For example, if we remove NP recursions from the grammar on Figure 7.11 8, we get:
we then obtain the new parse tree:

Figure 12.14.
Simplified grammar

Figure 12.15.
Another parse tree for a simplified grammar

Compare this new tree with that in Figure 7.12: the two parse trees are different, even though the two grammars are equivalent and the sentence is unchanged.

Typically, when building large-scale grammars for a substantial portion of a language, it will be necessary to organize them, independently from the structure of the sentences to be analyzed. For example, the grammar in Figure 12.10 , which is still a simple grammar, contains 40 graphs9 : there are in fact several types of transitive sentences (declarative, interrogative, nominal), several types of Noun Phrases (subject, object), several types of proper names (person, place), several types of verbal groups (verbs conjugated in simple or compound tenses, with auxiliary, aspectual and/or modal verbs), several types of determiners (definite, indefinite, nominal, adverbial), etc. As a consequence, the parse tree showing the analysis of the sentence Joe thinks that Lea cannot see Ida is rather lengthy (Figure 12.16
Sign up to read
Learn more about book
eBook - ePub
The Development of Language and Language Researchers
Essays in Honor of Roger Brown
- Frank S. Kessel(Author)
- 2013(Publication Date)
- Psychology Press
  (Publisher)
Certainly, history has shown that something in my graduate student logic must have been amiss. The study of child language has become a thriving industry whereas the study of adult performance is kept alive by a few obsessional types, myself only intermittently included. Worse still, the "easy" problem I picked instead of language acquisition remains unsolved to this day. That problem is the "Parsing" problem: Simply put, how does the listener assign a syntactic analysis to the sentences of his language in the course of comprehension? In the mid-1960s, this problem seemed just about the right size. Enough linguistics had been done, or so it seemed, to specify a reasonably accurate picture of the language user's syntactic knowledge. The problem was to show how that knowledge is put to use during Parsing. Admittedly, the linguist's picture of syntactic knowledge was a bit abstract and the framework of transformational grammar made for awkward incorporation in a mechanical Parsing system. But that just left a few interesting subproblems for psycholinguists to solve: principally, to specify the grammar in a usable format and to determine the dimensions of the Parsing system that puts the grammar to use. Once I learned about augmented transition network grammars from Thome, Bratley, and Dewar, (1968) and Woods (1970), I thought we might have a solution to the format problem. But testing that conjecture unambiguously while at the same time testing hypotheses about the parameters of the Parsing system has proved much more difficult that anyone imagined. In consequence, it is still too soon to cash that promise I made to Roger so long ago in the William James lunch line. Much as we might like to, those of us who study adult linguistic performance cannot yet tell those of you who study language acquisition just what it is that your little language learners are learning to do. Instead, we can only tell you something of our struggle in the last few years and suggest some implications for work in language acquisition. If this makes a modest birthday present, I will just have to count on Roger to give me many more opportunities to do better.

The Dimensions of the Parsing Problem

There is general agreement about what the Parsing system does. The question is how
Sign up to read
Learn more about book

Index pages curate the most relevant extracts from our library of academic textbooks. They’ve been created using an in-house natural language model (NLM), each adding context and meaning to key research topics.

(1)	a.	Wild beasts frighten little children.
	b.	*Beasts children frighten wild little.

(2)	a.	Jane thought the crop was not healthy.
	b.	Jane did not think the crop was healthy.
	c.	*Jane thought not the crop was healthy.

Parsing

7 Key excerpts on "Parsing"

Persian Computational Linguistics and NLP

Abstract

5.1 Introduction

Psychology of Language (PLE: Psycholinguistics)

An Introduction to Sentence and Discourse Processes

Psycholinguistics (PLE: Psycholinguistics)

Central Topics

Cognition

From Memory to Creativity

Processing Sentences: Determining Syntactic Structure of a Sentence

Sentence Processing

Formalizing Natural Languages

The NooJ Approach

The Development of Language and Language Researchers

Essays in Honor of Roger Brown

The Dimensions of the Parsing Problem