Vowels from a Phonetic Perspective

Essentially, there are two perspectives from which we can describe vowels and consonants, a phonetic and a phonological one. On this page, we will adopt the first perspective and try to establish some general features that enable us to distinguish between the two different sound classes.


The most important feature that distinguishes vowels from consonants acoustically is that vowels do not exhibit a complete closure of the vocal tract. The active articulators involved in producing a vowel are thus limited to:

  1. the lower jaw – responsible for the degree of openness
  2. the tongue – responsible for dividing the oral tract into separate areas of resonance
  3. the lips – responsible for the degree of roundedness
  4. the velum (optionally) – responsible for nasality

The configuration of these articulators for any given vowel creates the complex frequencies that characterise it. We will explore the acoustic effects each individual articulator produces further below. As vowels are generally voiced1, the source is represented by the energy coming up through the glottis, which is vibrating at relatively regular intervals. These regular vibrations can be seen rather clearly if we look at a spectrogram, which is a form of visual representation for sound waves.

The picture above contains the waveform (top half) and associated spectrogram (bottom half) for the vowel [i]. The regular glottal pulses can be observed in the vertical striations that make the spectrogram look like someone’s run a fine comb through it from top to bottom. If you observe the spectrogram closely, you will also notice that the pulses are further apart and much less regular at the beginning (onset) and the end of the vowel. The irregularity at the beginning is due to the slightly abrupt start (glottal onset) of voicing because the sound is produced in complete isolation. The end exhibits clear signs of creaky voice, due to the fact that the vowel has been sustained for a relatively long period of time and I simply ran out of breath, so that there was no longer enough energy present for the vocal cords to keep vibrating regularly.

Another thing that becomes immediately obvious when looking at the spectrogram above is that there are certain frequencies in the speech signal that have higher intensities than the others. In our case, these frequencies are represented by colours that range beween yellow and dark red2 and that show up as fairly clear and steady lines. The technical term for these lines is formants and each vowel exhibits certain characteristic formant patterns. Formants are usually labelled F, followed by a number starting with 1, i.e. F1, F2, F3, etc. Usually, the relevant characteristics are to be found in the first three formants and reference to any higher formants is rarely made. You will also encounter the label F0 – pronounced [ˈɛfˈziːrəʊ] or [ˈɛfˈnoːt] –, but this is not really a formant and actually refers to the fundamental frequency or pitch at the relevant point in the utterance.

Simple Vowels

Simple vowels or monophthongs are characterised by formant patterns that exhibit relatively stable steady states, somewhat similar to the one represented in the above spectrogram, although the steady states are usually considerably shorter than in the example. We can further subdivide the vowels of English into ‘short’ and ‘long’ ones as follows:

The symbols used above are the ones we’ll be using for representing the vowels of RP and may well differ somewhat from the ones that you are used to. This is because most textbooks on English phonetics tend to be very conservative and will thus stick to more ‘traditional’ symbols. It is quite likely that whenever you find an unfamiliar representation, the one next to it in brackets is the one you are will find in most of the relevant literature. Further symbols will be introduced for the relevant particular accents as and when necessary.

The distinction according to long and short vowels is also one that may be a little bit confusing at times because all vowels tend to change their duration to some extent, depending on their environment. It would therefore probably be better to refer to them as ‘predominantly long’ or ‘predominantly short’.

The graphic below shows the approximate positions for each of the above vowels within a vowel chart.

long_i short_i e 3 shwa ae long_a long_u short_u closed_o but epsilon open_o short_o back_a

The vowel chart itself represents a kind of abstract model of the oral tract. Vowels that are located towards the top of the chart are refererred to as close or high vowels, the ones at the bottom as open or low. This is also reflected in the horizontal lines running through the chart, which further subdivide the chart into three areas, close, central (mid) and open. Further distinctions are also possible.

The dots on the chart for each of the vowels represent the highest point of the tongue during the production of each respective vowel. Theses positions, however, are of course somewhat idealised, since each vowel has a certain space within which it can be produced. This space will vary from speaker to speaker and in a sense we have to go through a kind of abstraction process whenever we want to get used to the patterns of a particular speaker, especially because even one and the same speaker is very unlikely to produce the same vowel in exactly the same way twice in a row.

  1. Going through the chart, practise imitating all the vowels, first going through them in counter-clockwise direction, then see if you can ‘jump’ around the chart and accurately produce each sound.
  2. Once you feel sufficiently confident in producing the vowels, pay particular attention to the position of the tongue inside your mouth and try to identify where the mass of the tongue is in relation to the roof of your mouth.
  3. Think about whether each sound occurs in your own accent or not. If it doesn’t, how would you ‘replace’ it in your own particular system and why?
  4. Try to find suitable examples for each of the vowels above. Create both orthographic and phonetic transcriptions for each example (to the best of your current abilities) in the textbox above.

So how does the position of the tongue in this case actually influence the quality of the individual vowel? Well, as we have already discussed, the shape of the oral tract acts as a sort of filter for the sound coming up from the glottis. This sound can then further be modified by the position of the tongue. If the tongue is relatively low and flat, as for an /a(ː)/, then it is more or less only the shape of the oral tract itself that is responsible for the filtering of the vowel. However, as soon as we start moving the tongue away from this flat position, it begins to change the shape of the resonance cavity by dividing it into two areas of resonance, thereby changing the frequency/formant pattern produced. Translating this back into degree of openness and tongue height, we can observe that the first formant becomes lower the more we close the jaw and move the tongue to a closed position. In the phonetics literature, you’ll thus often find descriptions like “F1 is inversely related to vowel height”.

But apart from being able to distinguish vowels on a range from open to closed, we can also distinguish them according to their degree of backness, as the two ‘vertical’ lines on the chart show. Here, we have a range from front via central to back. The two images below illustrate the two ‘extreme points’ for English.


The further back we move the mass of the tongue, the lower the second formant becomes, so that especially for higher back vowels the first and second formants tend to converge more and more. For English, it is also the case that lip rounding generally increases with the degree of backness. Since this increases the size of the oral tract, it has the general effect of lowering all formants. The effect of the velum as an articulator is only marginal in English since English has no true nasal vowels, such as we find them in French. However, when the velum is lowered during the production of nasal consonants adjacent to a vowel, this tends to influence the quality of the vowel in that anti-formants are produced, so that some of the natural formant frequencies of the vowel are effectively neutralised.

If you want to learn about the frequencies and characteristics of (RP) monophthongs, you can consult the relevant page of my Practical Phonetics course.


Diphthongs are combinations of two vowels – in other words, there is a movement or glide from one vowel to another. In a spectrogram, this can often be seen relatively clearly because they tend to display diverging formant patterns, such as in the spectrogram of the word house below.

In English, it is usually the first vowel element that is more prominent, whereas the second one tends to be reduced.

We can again distinguish between two different types:

  1. closing: , , ɔɪ , , əʊ
  2. centring: ɪə , ɛə (eə) , ʊə

For diphthongs of the type a), the starting element is relatively open and there is a glide towards either a relatively closed front (ɪ) or back (ʊ) vowel. The chart immediately below shows the possible options.

Diphthongs of type b), on the other hand, have relatively peripheral starting point, as the next chart demonstrates.

The above diphthongs of type b) generally only occur in non-rhotic accents of English, though, i.e. those that do not produce a post-vocalic /r/.

  1. Try to find suitable examples for each of the diphthongs above and to write down/transcribe your examples.
  2. Think about whether all of these diphthongs would occur in your own accent. If they don’t, think about whether some of them may be considered old-fashioned or simply whether you accent may prefer long monophthongs instead.


Similarly to diphthongs, thriphthongs are a combination of simple vowels, only that this time, we have three in a row. As you can see, they are essentially formed by adding a shwa to one of the closing diphthongs described earlier, so that we end up with /eɪə/ as in player, sayer, layer, /aɪə/ , as in fire, tyre , /aʊə/ , as in hour, power, tower, /ɔɪə/ , as in loyal, royal, /əʊə/, as in lower, mower, rower. However, there is a tendency in at least some accents towards smoothing, a process whereby a triphthong is progressively modified, so that it becomes monophthongal in character and that e.g. [faɪə] becomes [faː], while in rhotic accents, the final element of the original diphthong becomes ‘r-coloured’.

Again, more information can be found on the vowel qualities page of the Practical Phonetics course.

Sources & Further Reading:

Fry, D.B. (1987). The Physics of Speech. Cambridge: CUP.

Johnson, K. (1997). Acoustic & Auditory Phonetics. Oxford: Blackwell.

Ladefoged, P. (1996). Elements of Acoustic Phonetics (2nd.). Chicago: University of Chicago Press.