ISPEECH PROCESSING, ITS
SPECIFICITY AND ITS
RELATION TO READING 1The Relation of Speech to
Reading and Writing
Alvin M. Liberman
Haskins Laboratories, New Haven, CT, USA
Theories of readingâwriting and theories of speech typically have in common that neither takes proper account of an obvious fact about language that must, in any reckoning, be critically relevant to both: There is a vast difference in naturalness (hence ease of use) between its spoken and written forms. In my view, a theory of reading should begin with this fact, but only after a theory of speech has explained it.
My aim, then, is to say how well the difference in naturalness is illuminated by each of two theories of speechâone conventional, the other less soâand then, in that light, to weigh the contribution that each of these can make to an understanding of reading and writing and the difficulties that attend them. More broadly, I aim to promote the notion that a theory of speech and a theory of readingâwriting are inseparable, and that the validity of the one is measured, in no small part, by its fit to the other.
WHAT DOES IT MEAN TO SAY THAT SPEECH IS MORE NATURAL?
The difference in naturalness between the spoken and written forms of language is patent, so I run the risk of being tedious if I elaborate it here. Still, it is important for the argument I mean to make that we have explicitly in mind how variously the difference manifests itself. Let me, therefore, count the ways.
1. Speech is universal. Every community of human beings has a fully developed spoken language. Reading and writing, on the other hand, are relatively rare. Many, perhaps most, languages do not even have a written form, and when, as in modern times, a writing system is derivedâusually by missionariesâit does not readily come into common use.
2. Speech is older in the history of our species. Indeed, it is presumably as old as mankind, having emerged as perhaps the most important of our species-typical characteristics. Writing systems, on the other hand, are developments of the last few thousand years.
3. Speech comes earlier in the history of the individual; readingâwriting come later, if at all.
4. Speech must, of course, be learned, but it need not be taught. For learning to speak, the necessary and sufficient conditions are but two: membership of the human race and exposure to a mother tongue. Indeed, given that these two conditions are met, there is scarcely any way that the development of speech can be prevented. Thus, learning to speak is a precognitive process, much like learning to perceive visual depth and distance or the location of sound. In contrast, reading and writing need to be taught, although, given the right ability, motivation, and opportunity, some will infer the relation of script to language and thus teach themselves. But however learned, readingâwriting is an intellectual achievement in a way that learning to speak is not.
5. There are brain mechanisms that evolved with language and that are, accordingly, largely dedicated to its processes. Readingâwriting presumably engage at least some of these mechanisms, but they must also exploit others that evolved to serve nonlinguistic functions. There is no specialisation for readingâwriting as such.
6. Spoken language has the critically important property of âopennessâ: unlike nonhuman systems of communication, speech is capable of expressing and conveying an indefinitely numerous variety of messages. A script can share this property, but only to the extent that it somehow transcribes its spoken-language base. Having no independent existence, a proper (open) script is narrowly constrained by the nature of its spoken-language roots and by the mental resources on which they draw. Still, within these constraints, scripts are more variable than speech.
One dimension of variation is the level at which the message is represented, although the range of that variation is, in fact, much narrower than the variety of possible written forms would suggest. Thus, as DeFrancis (1989) convincingly argues, any script that communicates meanings or ideas directly, as in ideograms, for example, is doomed to arrive at a dead end. Ideographic scripts cannot be open â that is, they cannot generate novel messagesâand the number of messages they can convey is never more than the inventory of one-to-one associations between (holistically different) signals and distinctly different meanings that human beings can master. Indeed, it is a distinguishing characteristic of language, and a necessary condition of its openness, that it communicates meanings indirectly, via specifically linguistic structures and processes, including, nontrivially, those of the phonological component. Not surprisingly, scripts must follow suit; in the matter of language, as with so many other natural processes, it is hard to improve on nature.
Constraints of a different kind apply at the lower levels. Thus, the acoustic signal, as represented visually by a spectrogram, for example, cannot serve as a basis for a script; although spectrograms can be puzzled out by experts, they, along with other visual representations, cannot be read fluently. The reason is not primarily that the relevant parts of the signal are insufficiently visible; it is, rather, that, owing to the nature of speech, and especially to the coarticulation that is central to it, the relation between acoustic signal and message is complex in ways that defeat whatever cognitive processes the âreaderâ brings to bear. Narrow phonetic transcriptions are easier to read, but there is still more context-, rate-, and speaker-conditioned variation than the eye is comfortable with. In any case, no extant script offers language at a narrow phonetic level. To be useable, scripts must, apparently, be pitched at the more abstract phonological and morphonological levels. That being so, and given that readingâwriting require conscious awareness of the units represented by the script, we can infer that people can become conscious of phonemes and morphoponemes. We can also infer about these units that, standing above so much of the acoustic and phonetic variability, they correspond approximately to the invariant forms in which words are presumably stored in the speakerâs lexicon. A script that captures this invariance certainly has advantages. At all events, some scripts (e.g. Finnish, Serbo-Croatian) do approximate to purely phonological renditions of the language, while others depart from a phonological base in the direction of morphology. Thus, English script is rather highly morphophonological, Chinese even more so. But, as DeFrancis (1989; see also Wang, 1981) makes abundantly clear, all these scripts, including even the Chinese, are significantly phonological, and, in his view, they would fail if they were not; the variation is simply in the degree to which some of the morphology is also represented.
Scripts also vary somewhat, as speech does not, in the size of the linguistic segments they take as their elements, but here, too, the choice is quite constrained. Surely, it would not be correct to make a unit of the script equal to the phoneme and a half, a third of a syllable, or some arbitrary stretchâsay 100 millisecondsâ of the speech stream. Still, scripts can and do take as their irreducible units either phonemes or syllables, so in this respect, too, they are more diverse than speech.
7. All of the foregoing differences are, of course, merely reflections of one underlying circumstanceânamely, that speech is a product of biological evolution, whereas writing systems are artifacts. Indeed, an alphabetâthe writing system that is of most immediate concern to usâis a triumph of applied biology, part discovery, part invention. The discoveryâsurely one of the most momentous of all timeâwas that words do not differ from each other holistically, but rather by the particular arrangement of a small inventory of the meaningless units they comprise. The invention was simply the notion that if each of these units were to be represented by a distinctive optical shape, then everyone could read and write, provided they knew the language and were conscious of the internal phonological structure of its words.
HOW IS THE DIFFERENCE IN NATURALNESS TO BE UNDERSTOOD?
Having seen in how far speech is more natural than readingâwriting, we should look first for a simple explanation, one that is to be seen among the surface appearances of the two processes. But when we search there, we are led to conclude, in defiance of the most obvious facts, that the advantage must lie with readingâwriting, not with speech. Thus, it is the eye, not the ear, that is the better receptor; the hand, not the tongue, that is the more versatile effector; the print, not the sound, that offers the better signal-to-noise ratio; and the discrete alphabetic characters, not the nearly continuous and elaborately context-conditioned acoustic signal, that offers the more straightforward relation to the language. To resolve this seeming paradox and to find the enlightenment we seek, we shall have, therefore, to look more deeply into the biology of speech. To that end, I turn to two views of speech to see what each has to offer.
The Conventional View of Speech as a Basis for Understanding the Difference in Naturalness.
The first assumption of the conventional view is so much taken for granted that it is rarely made explicit. It is, very simply, that the phonetic elements are defined as sounds. This is not merely to say the obvious, which is that speech is conveyed by an acoustic medium, but rather to suppose, in a phrase made famous by Marshall McLuhan, that the medium is the message.
The second assumption, which concerns the production of these sounds, is also usually unspoken, not just because it is taken for granted, although surely it is, but also because it is apparently not thought by conventional theorists to be even relevant. But, whatever the reason, one finds among the conventional claims none that implies the existence of a phonetic mode of actionâthat is, a mode adapted to phonetic purposes and no other. One therefore infers that the conventional view must hold (by default, as it were) that no such mode exists. Put affirmatively, the conventional assumption is that speech is produced by motor processes and movements that are independent of language.
The third assumption concerns the perception of speech sounds, and, unlike the first two, is made explicitly and at great length (Cole & Scott, 1974; Crowder & Morton, 1969; Diehl & Kluender, 1989; Fujisaki & Kawashima, 1970; Kuhl, 1981; Kuhl & Miller, 1975; Miller, 1977; Oden & Massaro, 1978; Stevens, 1975). In its simplest form, it is that perception of speech is not different from perception of other sounds; all are governed by the same general processes of the auditory system. Thus, language simply accepts representations made available to it by perceptual processes that are generally auditory, not specifically linguistic. So, just as language presumably recruits ordinary motor processes for its own purposes, so, too, does it recruit the ordinary processes of auditory perception; at the level of perception, as well as action, there is, in the conventional view, no specialisation for language.
The fourth assumption is required by the second and third. For if the acts and percepts of speech are not, by their nature, specifically phonetic, they must necessarily be made so, and that can be done only by a process of cognitive translation. Presumably, that is why conventional theorists say about speech perception that after the listener has apprehended the auditory representation they must elevate it to linguistic status by attaching a phonetic label (Crowder & Morton, 1969; Fujisaki & Kawashima, 1970; Pisoni, 1973), fitting it to a phonetic prototype (Oden & Massaro, 1978), or associating it with some other linguistically significant entity, such as a âdistinctive featureâ (Stevens, 1975).
I note, parenthetically, that this conventional way of thinking about speech is heir to two related traditions in the psychology of perception. One, which traces its origins to Aristotleâs enumeration of the five senses, requires of a perceptual model that it have an end organ specifically devoted to its interests. Thus, ears yield an auditory mode; eyes, a visual mode; the nose, an olfactory mode; and so on. Lacking an end organ of its very own, speech cannot, therefore, be a mode. In that case, phonetic percepts cannot be the immediate objects of perception; they can only be perceived secondarily, as the result of a cognitive association between a primary auditory representation appropriate to the acoustic stimulus that excites the ear (and hence the auditory mode) and, on the other hand, some cognitive form of a linguistic unit. Such an assumption is, of course, perfectly consistent with another tradition in psychology, one that goes back at least to the beginning of the 18th century, where it is claimed in Berkeleyâs New theory of vision (1709) that depth (which cannot be projected directly onto a two-dimensional retina) is perceived by associating sensations of muscular strain (caused by the convergence of the eyes as they fixate objects at various distances) with the experience of distance. In the conventional view of speech, as in Berkeleyâs assumption about visual depth, apprehending the event or property is a matter of perceiving one thing and calling it something else.
Some of my colleagues and I have long argued that the conventional assumptions fail to account for the important facts about speech. Here, however, my concern is only with the extent to which they enlighten us about the relation of spoken language to its written derivative. That the conventional view enlightens us not at all becomes apparent when one sees that, in contradiction of all the differences enumerated earlier, it leads to the conclusion that speech and readingâwriting must be equally natural. To see how comfortably the conventional view sits with an (erroneous) assumption that speech and readingâwriting are psychologically equivalent, one need only reconsider the four assumptions of that view, substituting, where appropriate, âopticalâ for âacousticâ or âvisualâ for âauditoryâ.
One sees then that, just as the phonetic elements of speech are, by the first of the conventional assumptions, defined as sounds, the elements of a writing system can only be defined as optical shapes. As for the second assumptionâ namely that speech production is managed by motor processes of the most general sortâwe must suppose that this is exactly true for writing; by no stretch of the imagination can it be supposed that the writerâs movements are the output of an action mode that is specifically linguistic. The third assumption of the conventional view of speech also finds its parallel in readingâwriting, for, surely, the percepts evoked by the optical characters are ordinarily visual in the same way that the percepts evoked by the sounds of speech are supposed to be ordinarily auditory. Thus, at the level of action and perception, there is in readingâwriting, as there is assumed to be in speech, no specifically linguistic mode. For speech, that is only an assumptionâand, as I think, a very wrong oneâ but for readingâwriting it is an incontrovertible fact; the acts and percepts of readingâwriting did not evolve as part of the specialisation for language, hence they cannot belong to a natural linguistic mode.
The consequence of all this is that the fourth of the conventional assumptions about speech is, in fact, necessary for readingâwriting and applies perfectly to it: Like the ordinary, nonlinguistic auditory and motor representations seen in the conventional view of speech, the correspondingly ordinary visual and motor representations of readingâwriting must somehow be made relevant to language, and that can only be done by a cognitive process; the readerâwriter simply has to learn that certain shapes refer to units of the language and that others do not.
It is this last assumption that most clearly reveals the flaw that makes the conventional view useless as a basis for understanding the most important difference between speech and readingâwritingânamely, that the evolution of the one is biological, the other cultural. To appreciate the nature of this shortcoming, we must first consider how either mode of language transmission meets a requirement that is imposed on every communication system, whatever its nature and the course of its development. This requirement, which is commonly ignored in arguments about the nature of speech, is that the parties to the message exchange must be bound by a common understanding about which signals, or which aspects of which signals, have communicative significance; only then can communication succeed. Mattingly and I have cal...