The main difference between vowels and consonants is that some off the latter usually don’t show any clear formant patterns themselves, but rather affect the patterns of the vowels surrounding them. We have already looked at the general features of consonants in our introduction to acoustic phonetics. Here, we want to investigate specifically how these features can be identified by using the facilities for ‘visualising’ speech using analysis tools. In other words, we want to investigate the physical properties related to manner and place of articulation.
Amongst the consonants, the fricatives are the easiest to identify visually, partly due to their relative length compared to plosives, and partly due to the fact that they exhibit strong ‘random’ noise patterns in specific frequency areas, depending on their place of articulation. These noise patterns are due to the turbulent nature of fricatives and their relative length to their sustainability. The following graphic shows a realisation of the word station, with the two fricatives [s] and [ʃ] clearly marked in the transcription:
Amongst the fricatives, there is fairly strong division in terms of place of articulation and intensity. The sibilants /s/ and /ʃ/ and their voiced counterparts, i.e. alveolar and palato-alveolar fricatives, tend to be much stronger than the rest of the fricatives. Labio-dental and dental fricatives tend to be much weaker and in English especially the dental ones /θ/ and /ð/ are often so weak that they readily undergo assimilation processes, e.g. in /ɪzðat/ ⇒[ɪzzat]. Generally, fricatives produced further towards the front of the oral cavity have a higher frequency than those produced in the back, which explains the difference in the examples shown above and also why the glottal fricative /h/ has relatively low formant values that actually often mirror those of the following vowel. Labio-dental and dental fricatives are an exception to the rule, though, as the influence of the oral cavity on their frequencies is minimal because they are produced so far forward in the mouth. Overall, voiced fricatives tend to be weaker than their voiceless counterparts, but may exhibit a voice bar in the low frequency region.
Oral plosives can usually be identified by a stop gap similar to a pause, which is caused by the closure phase immediately prior to the release. Voiceless initial fricatives in most accents of English usually also exhibit a brief noisy element – similar to a very short fricative – before the onset of the following vowel, signalling the aspiration, as can be seen in the following example spectrogram of the word car, produced by a German speaker.
The period of aspiration in the preceding example can be identified relatively easily, both from the rather flat shape in the waveform, and the similarity to the following vowel in its ‘formant’ pattern in the spectrogram.
Some accents of English, such as e.g. Yorkshire English, do not necessarily have this aspiration, which makes initial voiceless plosives in these accents sound very similar to voiced plosives. In this case, the only difference between voiced and voiced plosives is the voice onset time (VOT), i.e. whether the voicing begins before the release or after it.
For some time, it had been assumed that the place of articulation of plosives can be identified by particular identical formant transitions (loci) into or out of the surrounding vowels, no matter what the quality of the surrounding vowels was. This has recently been shown not to be the case, since there is a dependence between the transition and the qualities of the adjacent vowel, especially its F2 frequency. According to Ladefoged (2003: p. 163) it is better to use so-called locus equations, where one assumes an idealised locus of 900 Hz for bilabial plosives, of 1,560 Hz for alveolar and 1,280 Hz for velar ones and relates it to the F2 frequency of the adjacent vowel. Transitions towards F1 can usually be ignored since there is a general rise towards it in all plosives.
As we have seen previously, the nasal consonants are the nasal counterparts of the voiced plosives in English, the main difference being that the air does not escape through the mouth, but rather the nasal tract. This ‘redirection’ of the air flow is responsible for characteristic bands of voicing in the low frequency area (around the area of F1 for vowels). All other formants are considerably weakened, though, due to the dampening effect of the nasal tract, so that nasal consonants often look very much like vowels that only have a first formant. You can see this in the following illustration of the filler /ɛ:m:/, spoken by a female informant:
Affricates are rather similar to voiceless aspirates plosives, only that the ‘aspiration phase’ is of course much more pronounced and longer than in the former, as can be seen in the selection of the following graphic representing a female speker’s realisation of the word manage:
Approximants usually exhibit formants, similar to nasals, but which are fainter than in vowels. For laterals, there is usually an abrupt break in the formant pattern. The difference between a clear (/l/) and a dark l (/ɫ/) can be seen fairly clearly in formant patterns in the two graphics below, where the one for the latter is similar to that of a high back vowel (→ l-vocalisation). Each time, the selection (yellow background) marks the l.
The glides /j/ & /w/ are similar to /i:/ and /u:/ respectively, but it’s more difficult to detect any steady state patterns due to the abrupt movements. The main difference between the two is easily recognisable in the F2 pattern movements, which clearly show the absence of lip rounding in /j/ and, conversely, the increased rounding for /w/, as can be seen in the following two graphics of the words you and well respectively:
From an articulatory point of view, the similarities to the two high vowels can be explained in comparison to the two cardinals 1 & 8, which represent ‘extreme’ vowel qualities before a further narrowing of the articulators results in the production of the two glides.
And finally, the /ɹ/ glide can generally be recognised by a drop in the frequency of F3 in the adjacent vowel. The graphic below illustrates this quite nicely, once in the r in the middle of the word area and a second time, which is a little less clear, towards the end of the chunk, where the speaker is beginning to produce an intrusive r in anticipation of the preposition on.
Ladefoged, P. (2003). Phonetic Data Analysis: An Introduction to Fieldwork and Instrumental Techniques. Oxford: Blackwell.
Johnson, K. (2003). Acoustic & Auditory Phonetics (2nd ed.). Oxford: Blackwell.