Languages & Linguistics

Fundamental Frequency

The fundamental frequency refers to the lowest frequency of a periodic waveform, such as a sound wave produced by the human voice. In linguistics, it is associated with the pitch of a spoken utterance and is crucial for conveying intonation and meaning in language. The fundamental frequency is measured in hertz and plays a significant role in phonetics and speech analysis.

Written by Perlego with AI-assistance

3 Key excerpts on "Fundamental Frequency"

Index pages curate the most relevant extracts from our library of academic textbooks. They’ve been created using an in-house natural language model (NLM), each adding context and meaning to key research topics.
  • Speech Acoustic Analysis
    • Philippe Martin(Author)
    • 2020(Publication Date)
    • Wiley-ISTE
      (Publisher)
    The laryngeal frequency can vary considerably during phonation and can extend over several octaves. In extreme cases, transitions from 100 Hz to 300 Hz (change from normal phonation to falsetto mode) can be observed during an interval of two or three cycles.
    Furthermore, successive cycles can show variations of several percent around a mean value, depending, among other things, on the physiological state of the muscles involved in the vibration mechanism. Even direct observation (for example, by rapid cinematography) does not always allow precise identification of the cycle beginnings, due to creaky voice, breath, etc. This may result in errors that are difficult to minimize.
    The name “Fundamental Frequency”, given to the acoustic measurement of laryngeal vibration, derives from the similarity with the same term given to the base frequency in a Fourier analysis, in other words, the presence of frequency harmonics integer multiples of the fundamental. This can sometimes result in confusion, which the context is not always sufficient to resolve.
    F0 can be measured from the speech signal in the time domain, for example after signal filtering, or in the frequency domain, from the Fundamental Frequency (in the Fourier sense) of a voiced sound. The successive variations in F0 values over time are plotted in the graph to determine a so-called pitch curve, produced during phonation (Figure 7.6 ). This pitch curve conventionally displays null values at segments of unvoiced speech or silence.
    Figure 7.6.
    Example of a pitch curve displayed as a function of time and varying from approximately 95 Hz to 260 Hz
    The difficulty in measuring the Fundamental Frequency is largely due to the fact that, strictly speaking, there are no glottic vibration cycles, but rather the recurrence of a movement that is controlled by numerous parameters (adductor and tension muscles controlling the vocal folds, pressure under the glottic, etc.). The speech signal which the measurement is made from is the result of the complex interaction of glottic stimulation and temporal variations in the shape of the vocal tract.
  • Forensic Speaker Identification
    • Phil Rose(Author)
    • 2002(Publication Date)
    • CRC Press
      (Publisher)
    front rounded vowel. This relationship between auditory and acoustic F-pattern features is undoubtedly one reason why formants are typically sampled in forensic-phonetic comparisons. In such investigations, a prior auditory analysis is indispensable in order to identify areas of potential relevance that are then investigated and quantified acoustically. F-pattern (and Fundamental Frequency, to be discussed below) are correlates of auditory, transcribable qualities of speech that make the quantification and evaluation of the auditory analysis possible. In addition, of course, the behaviour of some formants is interpretable, via the acoustic theory of speech production described earlier in this chapter, in an articulatory coherent way. This is of obvious forensic use when one is trying to make inferences from acoustic patterns to the vocal tract that produced them.
    This point is mentioned because in automatic speaker recognition – speaker recognition carried out under optimum conditions – formants are not generally used. In automatic speaker recognition, other acoustic parameters, e.g. LP derived cepstral features are employed. Although more powerful than formants – they can discriminate between voices better – it has been assumed that they are more difficult to apply forensically because they do not relate in a straightforward way to articulation. Another reason put forward for the preference of formants over automatic parameters is that they are a lesser of two evils when it comes to explanation to a jury.

    Fundamental Frequency

    In this section is discussed and exemplified what is considered by many to be one of the most important parameters in forensic phonetics: Fundamental Frequency. Braun (1995: 9) for example, quotes four well-known authorities (French 1990a; Hollien 1990; Künzel 1987 and Nolan 1983) who claim that it is one of the most reliable parameters.
    Fundamental Frequency is abbreviated F0, and also called ‘eff-oh’, or ‘eff sub-zero’. It is also often referred to by its perceptual correlate, namely pitch
  • Speech Enhancement
    eBook - ePub

    Speech Enhancement

    Theory and Practice, Second Edition

    • Philipos C. Loizou(Author)
    • 2013(Publication Date)
    • CRC Press
      (Publisher)
    Chapter 3 , the opening and closing of the vocal folds during voicing produces periodic waveforms (voiced segments of speech). The time duration of one cycle of the vocal folds’ opening or closing is known as the fundamental period, and the reciprocal of the fundamental period is known as the vocal pitch or Fundamental Frequency (F0). The Fundamental Frequency F0 varies from a low frequency of around 80 Hz for male speakers to a high frequency of 280 Hz for children [29]. The presence or absence of periodicity signifies the distinction between voiced and unvoiced sounds (e.g., between [d] and [t]). The F0 periodicity is also responsible for the perception of vocal pitch, intonation, prosody, and perception of lexical tone in tonal languages.
    FIGURE 4.10 Example FFT-magnitude spectra of the vowel /eh/ corrupted by multitalker babble at 0 and 5 dB S/N. The noisy-magnitude spectra were shifted down for better clarity.
    The voiced segments of speech (e.g., vowels) are quasi-periodic in the time domain and harmonic in the frequency domain. The periodicity of speech is broadly distributed across frequency and time and is robust in the presence of noise. Figure 4.10 shows the FFT spectra of the vowel /eh/ in quiet and in noise. It is clear that the lower harmonics are preserved in noise, at least up to 1 kHz. This suggests that listeners have access to relatively accurate F0 information in noise. Such information, as we will discuss later (Section 4.3.3), is important for understanding speech in situations where two or more people are speaking simultaneously [32].

    4.2.4 RAPID SPECTRAL CHANGES SIGNALING CONSONANTS

    Unlike vowels, consonants are short in duration and have low intensity. Consequently, they are more vulnerable to noise or distortion, compared to vowels. The duration of the /b/ burst, for instance, can be as brief as 5–10 ms. Vowels on the other hand can last as long as 300 ms [33].
    Formant transitions associated with vowel or diphthong production are slow and gradual. In contrast, consonants (particularly stop consonants) are associated with rapid spectral changes and rapid formant transitions. In noise, these rapid spectral changes are preserved to some degree and serve as landmarks signaling the presence of consonants. Figure 4.11 shows example spectrograms of a sentence embedded in +5 dB babble noise. Note that the low-frequency (and intense) vowels alternate frequently with the high-frequency (and weak) consonants, resulting in sudden spectral changes. For instance, the low-frequency spectral dominance at near 500, 1000, and 1700 ms is followed by a sudden dispersion of spectral energy across the spectrum. These sudden spectral changes coincide with the onsets of consonants. Although for the most part the high-frequency information is smeared and heavily masked by the noise, the onsets of most of the consonants are preserved (see arrows in Figure 4.11