ONE BABY TALK
The first experiments in fetal hearing were conducted in the early 1920s. German researchers placed a hand against a pregnant womanâs belly and blasted a car horn close by. The fetusâs startle movements established that, by around twenty-eight weeksâ gestation, the fetus can detect sounds.1 Since then, new technologies, including small waterproof microphones implanted in the womb, have dramatically increased our knowledge of the rich auditory environment2 where the fetus receives its first lessons in how the human voice transmits language, feelings, mood, and personality.
The motherâs voice is especially critical to this learningâa voice heard not only through airborne sound waves that penetrate the womb, but through bone conduction along her skeleton, so that her voice is felt as vibrations against the body. As the fetusâs primary sensory input, the motherâs voice makes a strong and indelible âfirst impression.â Monitors that measure fetal heart rate show that, by the third trimester, the fetus not only distinguishes its motherâs voice from all other sounds, but is emotionally affected by it: her rousing tones kick up the fetal pulse; her soothing tones slow it.3 Some researchers have proposed that the motherâs voice thus attunes the developing nervous system in ways that predispose a person, in later life, toward anxiety or anger, calm or contentment.4 Such prenatal âpsychologicalâ conditioning is unproven, but it is probably not a bad idea for expectant mothers to be conscious, in the final two months of pregnancy, that someone is eavesdropping on everything they say, and that what the listener hears might have lasting impact. The novelist Ian McEwan used this conceit in his 2016 novel, Nutshell, which retells Shakespeareâs Hamlet from the point of view of a thirty-eight-week-old narrator-fetus who overhears a plot (though âpillow talk of deadly intentâ) between his adulterous mother and uncle.
As carefully researched as that novel is regarding the surprisingly acute audio-perceptual abilities of late-stage fetuses, McEwan takes considerable poetic license. For even if a fetus could understand language, the ability to hear speech in the womb is sharply limited. The uterine wall muffles voices, even the motherâs, into an indistinct rumble that permits only the rises and falls of emotional prosody to penetrateâin the same way that you can tell through the wall you share with your neighbor that the people talking on the other side are happy, sad, or angry, but you canât hear what theyâre actually saying. Nevertheless, after two months of intense focus on the motherâs vocal signal in the womb, a newborn emerges into the world clearly recognizing the motherâs voice and showing a marked preference for it.5 We know this thanks to an ingenious experiment invented in the early 1970s for exploring the newborn mind. Investigators placed a pressure-sensitive switch inside a feeding nipple hooked to a tape recorder. When the baby sucked, prerecorded sounds were broadcast from a speaker. Sounds that interested the infant prompted harder and longer sucking to keep the sound going and to raise its volume. Psychologist Anthony DeCasper used the device to show that three-day-olds will work harder, through sucking, to hear a recording of their own motherâs voice over that of any other female.6 The fatherâs voice sparked no special interest in the newborn7âwhich, on acoustical grounds, isnât surprising. The maleâs lower pitch penetrates the uterine wall less effectively and his voice is also not borne along the maternal skeleton. Newborns thus lack the two months of enwombed exposure to dadâs speech that creates such a special familiarity with, and âumbilicalâ connection to, momâs voice.
The sucking test has revealed another intriguing facet of the newbornâs intense focus on adult voices. In 1971, Brown University psychologist Peter Eimas (who invented the test) showed that we are born with the ability to hear the tiny acoustic differences between highly similar speech sounds, like the p and b at the beginning of the words âpassâ and âbass.â Both are made by an identical lip pop gesture. They sound different only because, with b, we make the lip pop while vibrating our vocal cordsâan amazingly well-coordinated act of split-second synchronization between lips and larynx that results in a âvoicedâ consonant. With the p, we pop the lips while holding the vocal cords in the open position, making it âunvoiced.â We can do this with every consonant: t, voiced, becomes d; k becomes hard g; f becomes v; ch becomes j. Babies, Eimas showed, hear these distinctions at birth, sucking hard with excitement and interest when a speech sound with which theyâve become bored (ga ga ga) switches to a fascinating new one (ka ka ka).8 Prior to Eimasâs pioneering studies, it was believed that newborns only gradually learn these subtle phonemic differences.
The significance of this for the larger question of how we learn to talk emerged when Eimas tested if infants could discriminate between speech sounds from languages they had never heardâin the womb or anywhere else. For English babies this included Kikuya (an African language), Chinese, Japanese, French, and Spanish, all of which feature minuscule differences in shared speech sounds, according to the precise position of the tongue or lips, or the pitch of the voice. The experiments revealed that newborns can do something that adults cannot: detect the most subtle differences in sounds. Newborns, in short, emerge from the womb ready and willing to hear, and thus learn, any languageâall seven thousand of them. This stands to reason, because a baby doesnât know if it is going to be born into a small French town, a hamlet in Sweden, a tribe in the Amazon, or New York City, and must be ready for any eventuality.9 For this reason, neuroscientist Patricia Kuhl, a leading infant language researcher, calls babies âlinguistic global citizensâ10 at birth.
But after a few months, babies lose the ability to hear speech sounds not relevant to their native tongueâwhich has huge implications for how infants sound when they start speaking. Japanese people provide a good example: when speaking English, adults routinely swap out the r and l sounds, saying ârakeâ for âlake,â and vice versa. They do this because they cannot hear the difference between English r and l. But Japanese newborns can, as Eimasâs sucking test shows. Change the ra sound to la, and Japanese babies register the difference with fanatic sucking. But around seven months of age, they start having trouble telling the difference. At ten months old, they donât react at all when ra changes to la. They canât tell the difference anymore. English babies actually get better at it.
The reason is exposure and reinforcement. The ten-month-old English baby has spent almost a year hearing the English-speaking adults around her say words that are distinguished by clearly different r and l sounds. Not the Japanese baby, who spent the ten months after birth hearing a Japanese r that sounds almost identical to our English l, the tongue lightly pushing against the gum ridge behind the upper front teeth. Because there is no clear acoustic difference between the Japanese r and the English l, Japanese babies stop hearing a difference. They donât need to, because their language doesnât depend on it.
All of which is to say that the developing brain works on a âuse it or lose itâ basis. Circuitry not activated by environmental stimuli (momâs and dadâs voices) is pruned away. The opposite happens for brain circuits that are repeatedly stimulated by the human voice. They grow stronger, more efficient. This is the result of an actual physical process: the stimulated circuits grow a layer of fatty cells, called myelin, along their axons, the spidery branches that extend from the cellâs nucleus to communicate with other cells. Like the insulation on a copper wire, this myelin sheath speeds the electrical impulses that flash along the nerve branches that connect the neurons which represent specific speech sounds. Neuroscientists have a saying: âNeurons that fire together, wire togetherââwhich is why the English babies in Eimasâs experiments got better at hearing the difference between ra and la: the neuron assemblies for those sounds fired a whole lot and wired themselves together. Not so for Japanese babies.
In short, the voices we hear around us in infancy physically sculpt our brain, pruning away unneeded circuits, strengthening the necessary ones, specializing the brain for perceiving (and eventually producing) the specific sounds of our native tongue.
Some infants fail to âwire inâ the circuits necessary for discriminating highly similar sounds. Take the syllables ba, da, and ga, which are distinguished by where, in the mouth, the initial sound is produced (b with a pop of the lips; d with a tongue tap at the gum ridge; g with the back of the tongue hitting the soft palate, also called the velum). These articulatory targets determine how the resulting burst of noise transitions into the orderly, musical overtones of the a-vowel that follows: a sweep of rapidly changing frequencies, over tens of milliseconds, that the normal baby brain, with repetition, wires in through myelinating the correct nerve pathways.
But some 20 percent or so of babies, for unknown reasons, fail to develop the circuits for detecting those fast frequency sweeps. Sometimes a child registers ba, sometimes ga or da. Parents are unaware of the problem because kids compensate by using contextual clues. They know that mom is saying âbatâ and not âpatâ because sheâs holding a bat in her hand. They know dad is talking about a âcarâ because heâs pointing at one. The problem surfaces only when the child starts school and tries to learn to read. That is, translate written letter-symbols into the speech sounds they represent. He canât do it, because his brain hasnât wired-in the sounds clearly. He might read the word âdadâ as âbad,â or âgab,â or âdab.â These children are diagnosed with dyslexia, a reading disorder long believed to be a vision problem (it was once called âword blindnessâ). Thanks to pioneering research in the early 1990s by neuroscientist Paula Tallal at Rutgers University, dyslexia is now understood to be a problem of hearing, of processing human voice sounds.11 Tallal has been helping to devise software that slows the frequency sweeps in those consonant-vowel transitions so that very young children can train their auditory circuits to detect the different speech sounds, and thus wire them in through myelination of the nerve pathways. All to improve their reading.
Of course, to learn a language, it is not enough simply to recognize the difference between pa and ba, or la and ra. To understand speechâand to produce it one dayâbabies must accomplish another exceedingly difficult feat of voice perception. Though it might seem, to us, as if we insert tiny gaps of silence between words when we speak (like the spaces between words on a printed page), thatâs a perceptual illusion. All voiced language is actually an unbroken ribbon of sounds all slurring together. To learn our native tongue, we had to first cut that continuous ribbon into individual wordsânot easy when youâre a newborn and have no idea what any words mean. You can get an idea of what you were up against by listening to a YouTube clip of someone speaking a language you donât know: Croatian, or Swahili, or Tagalog. Try listing ten words. You canât do it because you canât tell where one word ends and another begins. This is the problem you faced at birthâand, by around eight months, had solved.
Hereâs how. Despite appearances, babies, reclining in their strollers or lying in their cribs, are anything but passive receptors of the speech that resounds all around them. Indeed, even before birthâfrom the seventh month of gestation onwardâthe fetus runs a complex statistical analysis on the voices it perceives, and registers patterns. The sucking test shows that one pattern newborns detect is word stress.12 English, on average, emphasizes the first syllable of words: contact, football, hero, sentence, mommy, purple, pigeon; words that emphasize the second syllable (like surprise) are far less common. In French, itâs the reverseâa weak-strong pattern: âbonjour,â âmerci,â âvitale,â âheureux.â Babies zero in on these patterns and use them to locate word boundaries. Take a mystifying sequence of speech sounds like:
staytleeplumpbukmulaginkaymfrumtheestarehed
An American baby will apply Englishâs strong-weak probability to identify the first sound clusters (staytlee) as a possible stand-alone word (STAYT-leeâor âStatelyâ). The next two syllables, however (plumpbuk), donât make an English word, no matter what stress pattern you apply (PLUMP-buk; plump-BUK). To deal with that, the baby uses another type of statistical analysis. In all languages, the likelihood that one speech sound will follow another is highest within words, less likely across words. Patricia Kuhl supplies a good example from Polish, where the zb combination is common, as in the name Zbigniew.13 But in English zb occurs only across word boundaries, as in âleaveZ Blowâ or âwindowZ Breakââand thus crops up less frequently. Sophisticated listening tests show that eight-month-olds use these âtransition probabilitiesâ to segment the sound streamâand babies can do this after just two minutesâ exposure to a stream of unfamiliar speech sounds.
This staggering speed of learning speaks to Darwinâs assertion, in The Descent of Man, that speech acquisition in children reveals not an instinct for language, but an instinct to learnâas in an English babyâs lightning-fast realization that the pb in plumpbuk is illegal and that it makes sense to split the speech stream there, to create the separate chunks plump and buk. Eventually, the child will use both statistical strategies to help segment the entire sequence and arrive at the first words of James Joyceâs Ulysses:
Stately, plump Buck Mulligan came from the stairheadâŠ
She will accomplish this stunning feat before her first birthday, well before she has the least clue about what any of the words actually mean. But in snipping the sound ribbon into its separate parts, the baby stands a chance of figuring out how to assign meaning to each small cluster of soundsâclusters we call âwords.â
Babies do not do all this work on their own. They receive significant help from adults, who unconsciously adopt a highly artificial vocal style when addressing them.
Remarkably, no language expert took any formal notice of the unusual way we talk to infants until 1964, when Charles A. Ferguson, a linguist at Stanford University, published the paper âBaby Talk in Six Languages.â It catalogued the identical way parents speak to babies in a slew of widely different tongues, including Syrian Arabic, Marathi (a language of western India), and Gilyak (spoken in Outer Manchuria), as well as English and Spanish. In each instance, caregivers prune consonants (as when English parents use âtummyâ rather than âstomachâ) and use onomatopoeia (in English, âchoo chooâ for âtrain,â and âbow wowâ for âdogâ).14 Ferguson was not, however, investigating how babies learn to speakâyou could even say he was doing the exa...