Excellent reviews of selected aspects of this subject have already appeared. Marsden (1965, 1971); Holsti (1968); and Kiesler (in press) have reviewed numerous studies that have investigated the content of two-person interviews and conversations: what people discuss while conversing. Several excellent articles also have appeared that review related studies in which one conversational partner attempts to control or influence, through a methodology called verbal conditioning, the actual content of his conversational partner (Krasner, 1958, 1971; Salzinger, 1959; Greenspoon, 1962; Spielberger, 1965; Kanfer, 1968; Heller and Marlatt, 1969). In addition, Flanagan (1965) has reviewed studies on the physical qualities of speechâstudies involving spectrographic analyses of frequencies (tone), and intensities (volume). Hargreaves and Starkweather (1968) have shown that such spectrographic analysis of speech can be used to establish the identity of individual speakers using these same physical properties of speech alone (that is, yielding âvoice printsâ akin to fingerprints), Pittenger, Hockett, and Danehy (1960) provide an excellent introduction to still another method of studying speech, namely, through what they term a study of the para-linguistic aspects of speech (sighs, slurs, drawls, inhalations, loudness and softness, breathiness, speech coughs, etc.). Finally, Starkweather (1964) has provided an excellent review of additional dimensions of conversational speech such as rate or speed of speaking, reliability of human judgments in studies of speech, judging emotion from a speakerâs voice, recognition of speaker identity from voice alone, as well as related variables in human conversation. Since these various reviews already are available in these several references, this discussion will not deal with these topics, but the interested reader is encouraged to consult them for a fuller understanding of these additional dimensions of human speech.
Two Approaches to Studies of Speech
Investigators interested in studying speech, whether occurring in two-person conversations, the focus of our own studies, or occurring in groups of three or more persons, traditionally have focused their investigative interest on either of two facets of such speech: what the speaker says or how he says it. The first of these two approaches is called content analysis and includes study of variables such as frequency of usage of grammatical units (verbs, pronouns, adjectives, and other grammatical units) and themes (references to parents or other significant humans, distress-relief words, past or future tense usage, affectionate or hostile words, so-called manifest or latent meanings, degree of inferred empathy or lack of empathy toward the conversational partner inherent in the statement, anxiety-laden versus neutral themes and topic areas). (See above references to excellent reviews of such content-oriented studies.) The second approach to the study of human speech deals not with what is said (content) but, rather, with how it is said. Physicists, electronic engineers, experts in acoustics, students of high fidelity sound recording, and others have taken speech (spoken or sung) and have played it into oscilloscopes and other electronic equipment in order to analyze formal components of this kind of speech such as its frequencies, intensities, timbre, and related acoustical qualities. Flanagan (1965) provides an excellent review of this segment (acoustical) of the approach to study of how human speech is delivered.
Investigations in a subarea of how people speak are often referred to as content-free studies, or studies of the noncontent dimensions of speech. They also sometimes are referred to as studies of the formal properties of speech (for example, studies of frequencies and durations of individual utterance units, interruptions of oneâs conversational partner, latency before answering him). However, as indicated above, the acoustical variables reviewed by Flanagan also represent formal, noncontent, or content-free properties of speech. We and our colleagues have not conducted any studies in the Flanagan-type analysis of the physical properties of speech. We have limited ourselves to study of frequencies and durations of single units of utterance, latency, and interruption in two-person and multi-person conversational groups.
From the beginning we have been interested in, studying the current emotional, attitudinal, or motivational state of our intervieweesâin psychotherapy or in other real-life interview encounters. The interested reader will find the miniature theoretical framework, emanating from the work of Chapple (@@@1942), which guided our initial choice of interaction variables described at length in Matarazzo, Saslow, and Matarazzo (1956) and Saslow and Matarazzo (1959).
Defining Basic Units
Difficult as it might be to realize, a major hurdle to research on human speech (whether involving content or noncontent approaches) has been a lack of agreement among investigators on how to define the basic unit or units to be studied. The reader interested in this problem should ask himself how he would go about analyzing in a systematic manner the words written on the page he is currently reading; or, better still, the transcribed employment interview given in the appendix to this volume. After having decided on a category system the reader would next have to ask how he would divide the page of words (or interview) into units to apply his (probably content-oriented) category system. Would he simply use single sentences as his unit for analysis? Or shorter phrases? If the latter, how would he decide where one phrase begins and ends? After the reader arrives at an approach that satisfies him, he should ask a friend to carry out the same steps independently. The predictable resulting lack of agreement and confusion, if not downright hostility, between the reader and his friend provides a brief glimpse at what has transpired between serious investigators during the past two decades. Marsden (1965, 1971); Holsti (1968); and Kiesler (in press) have summarized and critically reviewed the major problems in this subarea.
One would expect that reaching agreement on standardized definitions of noncontent properties such as durations of single utterances in spoken conversation would present fewer problems for investigators working in different settings. Unfortunately, agreement has been no. more easily achieved by these investigators than by the content-analysis specialists. The reader should listen to the next person who engages him in conversation. How would the reader define an utterance? Is it all the words his conversational partner says until he stops and it is the readerâs turn to speak? Or is it all the words in a part of a sentence that he utters fluently until he pauses for a breath so that hesitations for breathing define a logical place to end and begin a unit? But if he chooses the latter approach to âunitizing,â how long will the pause have to be before it is really scored as a pause? For example, one second? Or one hundredth of a second? Lest the reader conclude that these are academic questions, let us hasten to point out that most if not all the results that will be presented below quite probably would be changed markedly if the unit of study were defined in a manner different than the one we will present shortly.
The problem of how to define an utterance length probably began with the work of Norwine and Murphy (1938), two specialists working for the Bell System. They monitored and analyzed 51 telephone calls between the Chicago and New York City business offices of the Bell System. The voice records were fed into a recording device that provided an oscillograph of the speech output of each of the two speakers in the 51 conversational pairs. Before proceeding with their analysis Norwine and Murphy had to define their speech measures. Their solution to the problem of what to record as a unit and how to record it is probably best presented in their own words.
In the simplest case of conversational interchange each party speaks for a short time, pauses, and the other party replies. The time intervals are then simply the lengths of time each party speaks and the lengths of the pauses between speeches. The period during which there is speech may be called a talkspurt, and the length of the pause may be called the response time. These two quantities would then suffice to describe this simple type of interchange.
In many instances, however, the process is not so orderly; for example, one speaker may pause and then resume speaking, or the listener may begin to reply without waiting for the end of the talkerâs speech. The possible, and indeed frequently encountered, variations of the simple cycle of which the preceding examples constitute only a fraction make it necessary to carefully define and delimit the elements into which a conversation may be resolved. It is believed that any telephonic conversation between two persons can be completely described in terms of the presence or absence of energy by the following time elements: âA talkspurt is speech by one party, including his pauses, which is preceded and followed, with or without intervening pauses, by speech from the other party perceptible to the one producing the talkspurt. Obvious exceptions to this definition are the initial and final talkspurts in a conversation. There may be simultaneous talkspurts by the two talkers; if one party is speaking and at the same time hears speech from the other, double talking is said to occur.
Resumption time is the length of the pause intervening between two periods of speech within a talkspurt.
Response time is the length of the interval between the beginning of a pause as heard by the listener and the beginning of his reply. It may be positive or negative. The pause to which reference is made ordinarily occurs at the end of a talkspurt but may be a pause followed by a resumption of speech by the first talker (p. 282).â
From this description it is clear that for Norwine and Murphy, and their contemporary successors at the Bell Telephone Laboratories (see Brady, 1965, 1968, 1969), the basic unit of utterance (a talkspurt) is what a layman might define as an utterance, namely, everything said (or the total speaking duration consumed) by one person from the moment he begins a new speech unit to the time he signals to his conversational partner that he is through with his share of his contribution and thus that the latter may now speak.
Duration of utterance
Because all the research that will be reviewed later depends upon the readerâs understanding what we recorded as a unit of utterance, an example of a total interview (employment) is given in the appendix of this volume (following the references). The job interview was conducted by one of the authors with an applicant for the position of patrolman in the same city. He was one of the 60 subjects in a second study we carried out on the relationship between the content of speech (what the interviewee is talking about) and the noncontent measures being studied by us (for example, how long he speaks per utterance about his occupational background versus his educational or family background). The Interviewer (Joseph D. Matarazzo) is designated as âI,â while the applicant is shown as âA,â in the transcription. Except for the necessary changes in or omissions of places and names of individuals that we made to protect the anonymity of our respondents, the interview transcription is not altered in any way. In all our research, and in the accompanying transcribed sample interview, an utterance (or speech unit) is recorded as the total duration of time it takes a speaker to emit all the words he is contributing in that particular unit of exchange (as this would he judged by common social standards).
In the transcribed interview in the Appendix single units of utterance for each speaker are transcribed next to his identifying designation (âIâ or âAâ). Thus, for example, as shown in the transcription, the interviewerâs first (single) utterance includes three sentences: âHow do you do, Mr.___________? My name is Dr.______. Can you tell me how you happened to apply for a patrolmanâs position with the city of______?â Conversely, even though the interviewerâs second utterance (beginning with âYou say . . . ,â and ending with â. . . to youâ) constitutes two sentences, it still is recorded as a single utterance. Likewise, the applicantâs first single utterance is defined as everything he said between âWellâ and the words â. . . sounded real interesting to me.â It should be clear that an utterance can contain only one word (such as âYesâ or âWhy?â), or two words (for example, the applicantâs response âMy dad?â in the middle of the appended interview), or it can contain many hundreds of words. In addition, an incomplete utteranceâbut nevertheless recorded as a full utteranceâcan terminate in the middle of a sentence as, for example, when the other person interrupts and the first speaker clearly stops speaking (either for a clearly defined pause or until the interrupting partner completes the utterance which constituted the interruption). In our studies, and following Norwine and Murphy and the anthropologist Eliot D. Chappie (1942) whose speech research gave impetus to our own investigations, we have defined a speech unit as an utterance separated at either end by two â silence periodsâone silence following the other participantâs last comment (that is, the speakerâs latency or response time) and the second silence following the speakerâs own comment and preceding the listenerâs next comment (that is, the listenerâs latency).1 Pauses for breathing, for choosing words, for reflection, for stuttering and stammering or other hesitation phenomena, or related disfluencies (what Norwine and Murphy call âresumption timeâ) do not signal the end of that particular utterance. Rather, they are included in the definition and recording of a single utterance when the content clearly suggests (and both conversational partners clearly appear to acknowledge) that, despite this disfluency, the current speaker has not completed that utterance. An example of one utterance or unit is: âYes, I can tell you about . . . (pause) . . . my father. He was a . . . (pause) . . . how can I say it . . . (pause) ... a man of many talents. You would . . . (pause) . . . have . . . (pause) . . . liked him.â However, pauses (again determined by context) that precede the introduction of new ideas or thoughts by the same individual, without an intervening comment by the other interview participant, signal the onset of a new speech unit. For¡¡ example: âItâs true that Iâve . . . (pause) . . . enjoyed hunting for most of my life.â (Pause two to three seconds) . . . âSpeaking of hunting, Iâm reminded of a favorite dog I had several years ago.â The context clearly suggested that one unit ended with âlifeâ and a second, or new unit, began with âSpeaking.â See Rogers (1942, pp. 265-437) and Wolberg (1954, pp. 688â780) for numerous examples of typical single interview units as scored by us. We have found observer reliability for this duration of utterance variable and the other variables here described (reaction time latency, initiative time latency, and number and percentage of interruptions) to be unusually high (Phillips, Matarazzo, Matarazzo, and Saslow, 1957; Wiens, Molde, Holman, and Matarazzo, 1966). Other investigators have confirmed the finding of high observer (or inter-scorer) reliability of one or another of these four speech variables (Chappie, 1949, p. 301; Goldman-Eisler, 1951, p. 355; Cervin, 1956, p. 164; Mahl, 1956a, p. 4; Kanfer and Marston, 1962, p. 427; Siegman and Pope, 1965, pp. 525-526; Pope and Siegman, 1966, p. 150; and Jaffe and Feldstein, 1970, p. 132).
Beginning in 1939 Chappie used essentially this common sense definition of a unit of interaction to characterize human relationships, and he studied the frequency and duration of such alternating series of what he called actions and inactions in a variety of two-person pairs or dyads (supervisor-supervisee; personnel interviewer-job applicant; doctor-patient). A summary of this work is given in Chappie (1949). Although he included in his unit of communicative action (ourâ duration of utterance variable) all verbal as well as nonverbal communications (for example, talking, smiling, head nodding), and we earlier followed this practice, we soon came to learn that, relative to verbal exchanges and with the exceptions of highly unique situations such as some segments of psychotherapy or those reviewed by Jaffe and Feldstein (1970, p. 13), the occurrence of smiling, head nodding, and other forms of nonverbal communicative gestures in isolation. (without concurrent speech) rarely constitute more than a very small fraction of any sample of conversation. Consequently, these nonverbal gestures can be omitted from the definition of a unit of utterance with little or no loss in fidelity (Wiens, Molde, Holman, and Matarazzo, 1966). This latter is what we have done for approximately the past ten years.2
To obtain an average (mean) of a speakerâs duration of utterance in any single conversation one adds up all the individual, or single, durations of utterance he emitted and divides this sum of durations by their total number of occurrences (mean duration of Aâs utterance equals the sum of all Aâs single utterances divided by the total number of such single units of utterance emitted by A. The same procedure is followed for speaker B).
In common with Norwine and Murphy, we also have recorded and studied...