Sound Files & Formats

Audio Data vs. Other Types of Electronic Data

As you may or may not be aware, sound files tend to be rather different from ordinary text or wordprocessor files. The first thing you’ll immediately notice is that – usually – they are much larger, due to the type of information stored in them. Whereas in uncompressed text files, one single character (letter) is usually stored as a single or at least only a few bytes on the computer, in multimedia data, such as sound files, a large number of different values needs to be stored for each second of a recording while converting the sound pressure waves to a digital signal. Exactly how many values are stored per second mainly depends on three factors, the sampling rate, the number of channels (mono or stereo) and quantisation (number of bits). A typical sampling rate for a CD-quality recording is 44 kHz, i.e. 44,000 samples per second, something that is far too high for most of our analysis purposes. Usually on a CD, we also have stereo (i.e. two-channel) recordings, something that is also rarely necessary in speech analysis. The quantisation level that we’ll commonly adopt ist that of 16bit, which allows for a relatively fine-grained distinction of levels of intensity in the signal. A one-minute (uncompressed) 16-bit stereo recording of CD quality would take up approximately 10.5 Mb of space on your hard disk, while reducing the number of channels to 1 would immediately also reduce the file size by half, so that we end up having approximately 5.25 Mb, which is still clearly quite a lot of disk space. However, since human speech rarely contains any important signals that exceed 8 or maybe 11 kHz, we can save more disk space by using sampling rates of 22 or even 16 kHz for most of our analyses, so that we end up with somewhere between 1.8 and 2.6 Mb of data per recording. You may have noticed that the recommended sampling rates I gave above are actually double the frequencies of the maximum frequencies that may be of interest to us. This is due to an effect called aliasing that we need to avoid and which occurs if we sample at too low a rate. According to the Nyquist theorem, the ideal frequency for avoiding any aliasing is exactly double the maximum frequency to be anlaysed.

Some Common Formats

As will have become clear from the discussion above, when storing sound files on your computer, one of the main differences between the different types/formats is whether they are compressed or not. Uncompressed files tend to be rather large and may take up a lot of valuable disk space, but of course any kind of data compression always carries with it the risk of not being able to recover all the original information when decompressing it again. However, van Son (2002) seems to conclude that overall, it is quite acceptable to use certain standard compressed formats for audio analysis. Below, we’ll list a few of the most common formats you may encounter:

.wav: originally the standard Windows audio format, it is now widely supported on most platforms. Usually uncompressed (PCM format), although files in compressed form do exist. When saving from any audio tool, always make sure that you choose the uncompressed form.
.mp3: compressed format (MPEG Layer 3). Up to about a tenth smaller than uncompressed .wav files.
.ogg: similar to .mp3, but better quality for the same file size. Not as widely supported as .mp3 yet, though.

Some programs may only ‘understand’ the .wav format. Because of this and also because most types of compression tend to be lossy, it is advisable store any recordings in .wav format first and then save copies in other formats for use in tools that can work with them.

Obtaining Recordings

There are a number of websites available that provide recordings of different accents of English. Below are two ‘stable’ ones that you can use as a starting point:

When you download any of the recordings, it is usually best not to simply click on the relevant link because most of the time, the sound files will then open directly in the associated plugin. It is therefore generally better if you use a right mouse-click on the link and choose the ‘Save (Link) As...’ or equivalent function of your browser.

If you cannot find any appropriate materials on these two sites, you can either do a web search for “sound archives” to find further useful materials or make your own recordings. However, we do not have enough time on this course to go into detail about how to best make high quality recordings and transfer them to the computer, so please refer to the relevant literature for this.

Let’s practise downloading a few audio files from the websites given above, so that we can later use them with our analysis programs.

Sources & Further Reading:

Audio File Format FAQ

Van Son, R.J.J.H. (2002b). Can standard analysis tools be used on decompressed speech?, Paper presented at the COCOSDA2002 meeting, Denver (URL:http://www.cocosda.org/meet/denver/COCOSDA2002-Rob.pdf).