Identifying Vowels and Their Particular Qualities

As we have already seen previously, vowels mainly differ from consonants in that the flow of air through the (mainly) oral tract is almost completely unimpeded. We have also stated that vowels are in most cases voiced, which clearly distinguishes them further from voiceless consonants, too. The latter can be observed in the vertical striations in a spectrogram or the periodicity in a waveform, as depicted in the illustration below.

periodicity & vertical striations as indicators of voicing

However, since periodicity/voicing are also features of voiced consonants, we need further means of distinction between them and vowels. One such feature is that vowels usually have at least three or four fairly distinct formants. These are patterns that indicate the quality of the individual vowel and manifest themselves through areas of high intensity in specific frequency regions that appear as more or less clear bands in spectrograms. The previous graphic shows these formants for a sustained /i:/ produced by myself. The thin red, green and blue lines running through the spectrograms have automatically been computed by WaveSurfer and help us to identify the first three formants. As you can see by looking at the green line, an identification of F₂ would otherwise be fairly difficult because it is relatively faint and tends to ‘blend into’ F₃.

Identifying Individual Formant Patterns

Monophthongs

Apart from listening to them, most monophthongs can be identified by looking at the their characteristic formant patterns for the first and second formant. In order to identify the frequency of the individual formants, it is best to try and identify steady state periods, i.e. periods in which the formant patterns remain relatively stable. Usually those patterns occur somewhere near the middle of the vowel because the beginnings and ends may be influenced by the effects of neighbouring consonants, which we’ll discuss later.

The formant patterns are determined by the relative intensity of the frequencies caused by the different constellations within the oral cavity. As pointed out on some of the previous pages, it is mainly due to the tongue’s separating the oral cavity into two more or less separate resonance chambers that the frequencies of F₁ and F₂ change, and understanding this also makes it easier to understand why the two formants are influenced in this way. Essentially, there are two rules that help us to relate formant frequencies to oral tract constellations:

F₁ is inversely related to vowel height, i.e. the higher the mass of the tongue, the lower the first formant.
F₂ is related to the degree of backness, i.e. the further back the mass of the tongue is, the lower the second formant will be.

The following illustration, based on an adapted version of John Wells’ table of mean formant values of RP, relates formant values to positions within an imaginary vowel chart. Please note that in order to establish this connection, the scales for the x- and y-axes actually need to be inverted, an idea that I took from Peter Ladefoged’s excellent introduction to vowels in his book Vowels and Consonants.

Let’s practise identifying monophthongs via their formant patterns:

First download John Wells’ table of formants from the link above (or simply keep the link window open).
Next, download my recording of English vowels and open it in WaveSurfer, using the configuration ‘Speech Analysis’.
Zoom into each vowel in turn sufficiently, so that you can identify the particular pattern of periodicity in the waveform.
Create your own rough chart on paper.
Measure the first two formants of each vowel, plot the values and compare them to the RP formant chart.

Diphthongs & Triphthongs

Diphthongs and triphthongs are essentially just combinations of monophthongs, with the main difference that the formant patterns associated with them usually show distinct movements from one vowel pattern to another. We can see this quite clearly in the diverging formant patterns in the spectrogram of the word choice below, where F₁ and F₂ are much closer together for the relatively low back vowel [ɔ] in the beginning and then move apart distinctly when it changes into the high front vowel [ɪ].

spectrogram of the word choice

In English diphthongs, the first element also tends to be more prominent and often longer than the second one.

A similar pattern to the one seen above can also be recognised in the other closing diphthongs ending in [ɪ], whereas the change in pattern is much less clear for the ones ending in [ʊ]. Let’s investigate this:

Download my recording of closing diphthongs.
Open the file in WaveSurfer.
Analyse and plot the changes in formant frequency in a vowel chart with inverted frequencies as we did above.

Centring diphthongs, on the other hand, all display more or less clear movements towards the formant pattern for the neutral (central) shwa as you can verify by analysing and plotting my recording of the centring diphthongs.

Since triphthongs are combinations of closing diphthongs + shwa, they exhibit similar final patterns to the centring diphthongs, while there initial patterns look like closing ones. Again, you can verify this by downloading, analysing and plotting my recording of triphthongs.

Voicing Characteristics

When we describe normal voicing in speech, we can also talk of modal phonation or voice. Two clear ways of deviating from modal voice are breathy voice, where there is an excess of breathing superimposed on the phonation and creaky voice, where there is a large amount of irregularity in the voicing, which may also result in a lower pitch, as can be seen and heard quite nicely in the illustration below.

The vertical striations being so far apart in the selection in this example are a very distinctive feature of this type of creaky voice, especially because the voice of the speaker is usually characterised by a relatively high overall pitch. If we want to represent creaky voice in our transcriptions, we use a tilde diacritic below the affected phoneme symbol(s), as in [ɪ̰]The creak itself has most likely been caused by the speaker’s failure to link the two words be and able so that she creates a hiatus. A similar effect can often be found when (syllable-)final plosives are replaced by glottal stops. A further feature we can spot in the sound file is that she also vocalises the final l, so that her final realisation is something like [bɪ̰ḛɪ̰b̰ɯ].

Sources & Further Reading:

Ladefoged, P. (2005). Vowels and Consonants (2nd ed.). Oxford: Blackwell.