1
The Frame/Content Theory
Peter F. MacNeilage
Department of Psychology
The University of Texas at Austin
Austin, Texas, USA
The Frame/Content (F/C) theory (MacNeilage, 1998a) attempts to explain the evolution and acquisition of speech. It says, in brief, that speech was made possible by our evolving the capacity to program syllabic âFramesâ with consonantal and vocalic âContentâ elements, and that the acquisition of that capacity in infants is the key window into how this happened. This theory owes much to four perspectives.
The most important perspective is given in the title of a paper by Dobzhansky: âNothing in Biology Makes Sense Except in the Light of Evolutionâ (Dobzhansky, 1973). According to Darwin (1859), evolution occurred by natural selection, resulting in a process of descent with modification, which eventually gave us the entire tree of life. And as Lindblom has often pointed out, most notably in a paper entitled âCan the models of evolutionary biology be applied to phonetic problems?â (1984), this makes it necessary to derive language in general and speech in particular from nonlinguistic phenomena including nonlinguistic precursors. The Frame/Content theory (1998a) is a phylogenetic response to this advocacy in which the initial form of speech is attributed to classic Darwinian natural selection acting on pre-existing, non-speech capacities. However, the elaboration of speech into its modern form with more than 6,000 variants is deemed more attributable to cultural selection.
One aspect of this perspective that apparently cannot be emphasized enough (literally, since some people never get it) is the importance of the time domain. To try to understand the origin of speech simply on the basis of its present form is likely to be an unrewarding enterprise. Speech must have evolved across time, and this must have been a matter of going from simple to more complex forms. So instead of attacking modern speech head-on, it might be more profitable to try to reconstruct the phylogenetically early simple forms, and then try to explain how and why the present-day complexities arose from them. The basic contention here is that the early simple forms probably resembled the early forms of modern infants.
This perspective is in stark contrast to the modern generative linguistic approach to speech, which is based on Platonic Essentialism. For Plato, the world consisted of a number of essencesâfixed and unchanging forms (See Toulmin & Goodfield, 1965, p. 40). Perhaps surprisingly to many readers, this idealized view of reality has been almost universally accepted by Western philosophers, most notably Descartes, and, more surprising still, has persisted even after Darwin made Essentialism scientifically obsolete. Chomsky, a modern-day Platonist (and vocal anti-Darwinist), has proposed an innate Universal Grammar that is basically a set of fixed and unchanging forms. He specifies no plausible origin for them and no process of change. Chomsky obviously had language in mind when he said: âYou canât just assume that just because somethingâs there it is functional, or has been adapted forâŚ. It could be just thereâ (Chomsky, cited by MacFarquhar, 2003, p. 71). (A detailed comparison between the F/C and the Chomskyian generative approaches to the evolution of speech can be found both in an article by MacNeilage and Davis, 2005a, and in a book by MacNeilage, 2008.)
A second major perspective bears on the question of how Darwinism might apply to the understanding of mental activity in particular. This perspective, increasingly important in modern cognitive science, is called âEmbodimentâ (Clark, 1997; see also Davis & MacNeilage, 2000). The embodiment perspective holds that mental activity, and the brain activity underlying it, cannot be ultimately explained outside of the context of bodily actions. This makes embodiment a variant of Darwinism, because natural selection is fundamentally based on modifications of successful actions. Roger Sperry, anticipating many years ago the emphasis of the embodiment perspective on action, said that âthe entire output of the nervous system consists of nothing but patterns of motor coordinationâ (Sperry, 1951, pp. 297-298). The F/C theory is driven, both conceptually and methodologically, by Sperryâs suggestion that the best way to understand the mind is to begin with patterns of motor coordination and derive the underlying mental structures from them. According to the F/C theory, the mental structures underlying speech arose from motor structures phylogenetically, and these structures arise from motor structures ontogenetically.
A third important perspective, which led to the particular form of the F/C theory, was laid out by Karl Lashley in a classic 1951 paper entitled âThe Problem of Serial Order in Behavior.â That problem, which Lashley considered âthe most neglected problem in cerebral physiologyâ (p. 114), is simply this: How does an organism put together any time-extended action pattern? Speech, of course, is serial organization par excellence. Lashleyâs belief that serial-ordering errors in speech were crucial to its understanding was what first got me interested in speech almost half a century ago. For me, solving the serial-ordering problem for speech will allow us to best understand its evolution.
The fourth perspective comes from the discipline of Ethology, the study of naturally occurring animal behavior. Tinbergen (1963) has posed a set of four questions, and has asserted that âa comprehensive, coherent science of Ethology has to give equal attention to all of them and to their integrationâ (p. 412). These questions were later adopted by Hauser (1996) in his monograph on âThe Evolution of Communication.â He asserted that âThese perspectives ⌠provide the only fully encompassing and explanatory approach to communication in the animal kingdom including human languageâ (p. 2). Here are these questions, as presented elsewhere (MacNeilage & Davis, 2005a) with some additions to Hauserâs description. The term âtraitâ here refers to any enduring behavior characteristic of a species: 1. Mechanistic: âHow does it work?â That is, what are the mechanisms (neural, physiological, psychological, etc.) underlying the expression of a trait? 2. Functional: âWhat does it do for the organism?â That is, what effects does the trait have on survival and reproduction? (This is the central Evolutionary Psychology issue of adaptation), 3. Ontogenetic: âHow does it get that way in development?â That is, what genetic and environmental factors guide the development of a trait? 4. Phylogenetic: âHow did it get that way in evolution?â That is, how does the evolutionary history of the species help us understand the structure of the trait in light of ancestral features?
These perspectives have one thing in commonâthe focus on action. Darwinian natural selection occurs on the basis of successful use. Embodiment involves bodily movement. Lashley was concerned with serial organization of output. Ethology focuses on behavior.
The main empirical base for the theory as it has developed over the last 15 years or so has come to be an ever-increasing set of speech macro patternsâ statistical regularities in serial organization within and across syllables, some of them found wherever there is speech like activity, but others present in only some speech domains. This mix of universals and non-universals makes it possible to triangulate in on a particular speech phenomenon. The connotation of the triangulation metaphor is that we can best understand a particular phenomenon by noting the extent to which it occurs at the intersection of various speech genres, or, put more simply, by noting the pattern of its occurrence and non-occurrence. My colleague Barbara Davis has been a partner in this work.
The Theory
Modern Adult Speech Organization and its Phylogenetic Roots
A theory of the evolution of speech must begin with a conception of what it is like now, even though it cannot end there. Lashleyâs basic insight was that speech errors in which one or more parts of a speech utterance are accidentally displaced in an otherwise correct sequence tell you both about what the functional units of speech are and how they are serially organized. The main organizational principle that emerged from work on speech errors at the phonological level is summarized by the Frame/Content metaphor (e.g. Levelt, W. J. M. 1992). Consonant and vowel âcontentâ elements are placed into syllable structure âframes.â The basic pattern is that misplaced consonant elements get put into consonantal positions in syllable structure and misplaced vowel elements get put into vowel positions. This is best illustrated by spoonerisms such as âpain matternâ for âmain patternâ and âwho nineâ for âhigh noon.â Most importantly, consonants and vowels play mutually exclusive roles in on-line syllable organization. Vowels are syllable nuclei, and consonants occupy syllable margins. They cannot exchange with each other as in ânoâ -> âown.â
How did this F/C mode of organization arise? A basic observation lying behind Frame/Content theory is that the movements of the only articulator that is always involved in both consonants and vowelsâthe mandibleâmust have always been mutually exclusive for the two forms. Consonants involve mandibular elevation (mouth closing) and vowels involve mandibular depression (mouth opening). If this close-open alternation was the basic articulatory pattern of the simple initial phases of speech, before consonants and vowels evolved a separable mental superstructure derived from this peripheral action pattern, then there may never have been an opportunity for vowels and consonants to get mixed up with each other in the evolution of the control program.
Where did this mandibular oscillation come from? One possibility was that the pattern was exapted from a basic mandibular cyclicity that evolved in the earliest mammals in conjunction with new capacities for oral ingestion and processing of foodâchewing, sucking and licking. The cycle may then have gone through an intermediate stage of visual communicative cyclicitiesâ lipsmacks, tonguesmacks, teeth chattersâcyclicities widespread in other modern primates (Redican, 1985), before being systematically paired with phonation to form protosyllables.
Selection pressure for mandibular cyclicities with phonation may have been exerted in the prespeech context of the evolution of âvocal groomingâ as a substitute for actual hands-on grooming when ancestral troop sizes got too large for the latter to remain effective, as suggested by Dunbar (1996). More generally, the ability to do these motor frames, and importantly to imitate them, may have been just one aspect of the evolution of a general-purpose mimetic ability as suggested by Donald (1991). There are good reasons to accept Donaldâs contention that a quantum jump in mimetic capability preceded the evolution of language and therefore speech.
Speech Ontogeny and its Implications for Phylogeny
The contention that the mandibular cycle is evolutionarily basic to speech is supported by the relatively fully formed emergence of the cycle with phonation at the advent of babbling in present day 7-month-old infants. Infants donât gradually and haltingly put together the close-open cycle in utterances such as âbababa.â These events are perceptibly rhythmic from the beginning, as befits a phylogenetically old function.
One particular fact, beyond the cyclical character of these early vocal patterns made them seem relevant to the earliest speech of hominids. It is that the dominant alternation pattern was in fact a close-open one rather than the reverse. The close-open alternation also underlies the only universal syllable pattern in languagesâthe Consonant-vowel (close-open) syllable. Even profoundly deaf infants, if they produce syllable-like behavior at all, tend to favor this CV pattern (McCaffrey, Davis & MacNeilage, 2000). This suggests that it is a basic motor pattern, as these infants are unlikely to have derived it rather than its VC mirror image from the input.
During the babbling stage, from roughly 7 to 12 months of age, and the so-called 50 word stage, from 12 to 18 months, a vocal output episode consists primarily of a single CV or a series of repetitions of the same CV. The latter is called reduplicative babbling. Consonants are primarily labial (lip) or coronal (tongue front) stop consonants and nasals, and vowels are primarily in the quadrant of the vowel space that contains mid and low, front and central vowels.
Our approach to speech acquisition (MacNeilage & Davis, 1990), again motivated by Lashley, was to look more closely at serial organization patterns in babbling and early speech, both within and across CV alternations. In this work we focused on stop consonants and nasals. We used a simple classification of consonantal place of articulation into labial, coronal and dorsal and a simple 3 Ă 3 division of the vowel space into a height dimensionâhigh, mid, lowâ and a front-back dimensionâfront, central, back.
The main thing we discovered at the...