Essentially, there are two perspectives from which we can describe vowels and consonants, a phonetic and a phonological one. On this page, we will adopt the first perspective and try to establish some general features that enable us to distinguish between the two different sound classes.
The most important feature that distinguishes vowels from consonants acoustically is that vowels do not exhibit a complete closure of the vocal tract. The active articulators involved in producing a vowel are thus limited to:
The configuration of these articulators for any given vowel creates the complex frequencies that characterise it. We will explore the acoustic effects each individual articulator produces further below. As vowels are generally voiced1, the source is represented by the energy coming up through the glottis, which is vibrating at relatively regular intervals. These regular vibrations can be seen rather clearly if we look at a spectrogram, which is a form of visual representation for sound waves.
Simple vowels or monophthongs are characterised by formant patterns that exhibit relatively stable steady states, somewhat similar to the one represented in the above spectrogram, although the steady states are usually considerably shorter than in the example. We can further subdivide the vowels of English into ‘short’ and ‘long’ ones as follows:
The symbols used above are the ones we’ll be using for representing the vowels of RP and may well differ somewhat from the ones that you are used to. This is because most textbooks on English phonetics tend to be very conservative and will thus stick to more ‘traditional’ symbols. It is quite likely that whenever you find an unfamiliar representation, the one next to it in brackets is the one you are will find in most of the relevant literature. Further symbols will be introduced for the relevant particular accents as and when necessary.
The distinction according to long and short vowels is also one that may be a little bit confusing at times because all vowels tend to change their duration to some extent, depending on their environment. It would therefore probably be better to refer to them as ‘predominantly long’ or ‘predominantly short’.
The graphic below shows the approximate positions for each of the above vowels within a vowel chart. You can click on the symbols to hear the corresponding sounds.
The vowel chart itself represents a kind of abstract model of the oral tract. Vowels that are located towards the top of the chart are refererred to as close or high vowels, the ones at the bottom as open or low. This is also reflected in the horizontal lines running through the chart, which further subdivide the chart into three areas, close, central (mid) and open. Further distinctions are also possible.
The dots on the chart for each of the vowels represent the highest point of the tongue during the production of each respective vowel. Theses positions, however, are of course somewhat idealised, since each vowel has a certain space within which it can be produced. This space will vary from speaker to speaker and in a sense we have to go through a kind of abstraction process whenever we want to get used to the patterns of a particular speaker, especially because even one and the same speaker is very unlikely to produce the same vowel in exactly the same way twice in a row.
So how does the position of the tongue in this case actually influence the quality of the individual vowel? Well, as we have already discussed, the shape of the oral tract acts as a sort of filter for the sound coming up from the glottis. This sound can then further be modified by the position of the tongue. If the tongue is relatively low and flat, as for an /a(ː)/, then it is more or less only the shape of the oral tract itself that is responsible for the filtering of the vowel. However, as soon as we start moving the tongue away from this flat position, it begins to change the shape of the resonance cavity by dividing it into two areas of resonance, thereby changing the frequency/formant pattern produced. Translating this back into degree of openness and tongue height, we can observe that the first formant becomes lower the more we close the jaw and move the tongue to a closed position. In the phonetics literature, you’ll thus often find descriptions like “F1 is inversely related to vowel height”.
But apart from being able to distinguish vowels on a range from open to closed, we can also distinguish them according to their degree of backness, as the two ‘vertical’ lines on the chart show. Here, we have a range from front via central to back. The two images below illustrate the two ‘extreme points’ for English.
The further back we move the mass of the tongue, the lower the second formant becomes, so that especially for higher back vowels the first and second formants tend to converge more and more. For English, it is also the case that lip rounding generally increases with the degree of backness. Since this increases the size of the oral tract, it has the general effect of lowering all formants. The effect of the velum as an articulator is only marginal in English since English has no true nasal vowels, such as we find them in French. However, when the velum is lowered during the production of nasal consonants adjacent to a vowel, this tends to influence the quality of the vowel in that anti-formants are produced, so that some of the natural formant frequencies of the vowel are effectively neutralised.
If you want to learn about the frequencies and characteristics of (RP) monophthongs, you can consult the relevant page of my Practical Phonetics course.
Diphthongs are combinations of two vowels – in other words, there is a movement or glide from one vowel to another. In a spectrogram, this can often be seen relatively clearly because they tend to display diverging formant patterns, such as in the spectrogram of the word house below.
In English, it is usually the first vowel element that is more prominent, whereas the second one tends to be reduced.
We can again distinguish between two different types:
For diphthongs of the type a), the starting element is relatively open and there is a glide towards either a relatively closed front (ɪ) or back (ʊ) vowel. The chart immediately below shows the possible options.
Diphthongs of type b), on the other hand, have relatively peripheral starting point, as the next chart demonstrates.
The above diphthongs of type b) generally only occur in non-rhotic accents of English, though, i.e. those that do not produce a post-vocalic /r/.
Similarly to diphthongs, thriphthongs are a combination of simple vowels, only that this time, we have three in a row. As you can see, they are essentially formed by adding a shwa to one of the closing diphthongs described earlier, so that we end up with
However, there is a tendency in at least some accents towards smoothing, a process whereby a triphthong is progressively modified, so that it becomes monophthongal in character and that e.g. [faɪə] becomes [faː], while in rhotic accents, the final element of the original diphthong becomes ‘r-coloured’.
Again, more information can be found on the vowel qualities page of the Practical Phonetics course.
Fry, D.B. (1987). The Physics of Speech. Cambridge: CUP.
Johnson, K. (1997). Acoustic & Auditory Phonetics. Oxford: Blackwell.
Ladefoged, P. (1996). Elements of Acoustic Phonetics (2nd.). Chicago: University of Chicago Press.