Speech perception is often portrayed as a decoding process that is exactly the opposite of speech production (as a form of encoding), but this conception – depicted in the graph below – is potentially somewhat misleading.
The process of decoding is – if anything – even more complex because the signal that originates from the speaker does not usually arrive at the receiving end – i.e. the hearer’s ear – just as it was emitted by the speaker, but is often modified even more by the medium through which it travels, as well as any background noises that may affect our hearing. Thus, for example, if we are outside in the street where there may be a lot of traffic noise or in a different noisy environment, such as a pub, a concert, etc., the signal may become quite distorted.
When the speech signal eventually arives at the hearer’s ear, it travels from the outer ear into the middle ear, where it sets the eardrum in motion. This then transmits its vibrations onto the auditory ossicles, the mallet, the anvil and the stirrup. These, in turn, conduct the vibrations to the oval window, which connects the middle and the inner ear, into the inner ear, thereby usually amplifying the sound, but also possibly protecting the ear from excessive pressure, such as may be caused by very loud noises.
The inner is filled with liquid that conducts the frequencies coming from the middle ear to the auditory nerve. Higher frequencies are picked up by the thin end of the basilar membrane within the snail-shaped cochlea, whereas the lowest frequencies tend to make the whole membrane vibrate. In this way, different types of nerve impulses are produced, which are then sent to the brain where some form of mental representation is created.
Crystal, David. (1997). The Cambridge Encyclopedia of Language (2nd ed.). Cambridge: CUP.
Johnson, K. (1997). Acoustic & Auditory Phonetics. Oxford: Blackwell.