Word Formation

As pointed out earlier, word-formation tries to explain the processes through which we can create new word forms. We’ve already seen some of these at work when we looked at morphemes and word classes, but now we’ll investigate them a little more closely, initially using exploratory methods again, rather than just looking at long lists of morphemes and listing their functions.

‘Strategies’ for Creating New Words

As far as morphological processes in word-formation are concerned, we can distinguish between a variety of major types, briefly introduced and summarised in the table below:

Major types of word formation processes
ProcessFunction
affixationchanging words by adding morphemes in the front or the back of a free morpheme or base; sub-divided into prefigation & suffigation
zero-derivationchanging the word class without changing the word shape
compoundingcreating new words by combining (mainly) free morphemes
backformationcreating new words from phrases
clipping & blendingabbreviating or ‘fusing’ words into new ones.
acronym formationusing initials to create short words

We’ll discuss each of these processes in some detail below.

Affixation: Inflection vs. Derivation

Affixation is the general process of attaching bound – rather than free – morphemes to a base. We can sub-divide the morphemes occurring in affixation processes further into the following types, based on their positions of attachement:

As we’ve already observed from Old English onwards in our previous examples, prefigation and suffigation (also referred to as prefixation & suffixation) are relatively common and straightforward processes. Especially in Old English, we saw that there were many more suffixes indicating inflectional features of nouns and verbs, such as e.g. case marking with agreement on adjectives and nouns in noun phrases (e.g. (mid) langum sċipum). Many of these have since disappeared from the English language, so that we now only have the limited set of inflectional options left that we noted in our discussion of word classes and their morphology.

In the frequency list for Old English, we were also still able to see examples of the Germanic circumfix, in those word forms that started with {ġe·} and ended in the allomorphs {on} or {an}. In terms of its underlying mechanism, infigation (infixation) – something we did not observe before – is a rather different thing in that it represents a feature that is relatively uncommon, and appears to be restricted solely to creating highly emphatic responses or exclamations that involve swear words.

To fully understand the word formation options affixation covers, we need to distinguish between its two major functions, the inflectional and the derivational one. While inflection, as we have seen, allow us to relate words to one another on the syntagmatic level, i.e. indicating what kinds of roles they perform on the clause level, how they combine with other words, or what kind of tense/aspect they may express, derivation makes it possible to create new words from old ones, either by changing their word class or modifying/‘specifying’ their meanings. In terms of their productivity, inflection has clearly diminished over time, whereas derivation still remains productive.

Although many affixes appear to have a relatively clearly defined function, recognising affix functionality is not always straightforward. Often what may superficially look like a specific affix (or a root) with a certain meaning may either be part of a longer unit, or not constitute an affix at all. Furthermore, we also encounter the same problem we saw earlier on (for instance with the {s} morpheme), i.e. that one and the same form may actually be multi-functional in representing a number of different meanings. In order to be able to understand this problem better, as well as to explore the potential functions of pre- and suffixes, let’s investigate some presumed pre- or suffixes by removing them and observing whether this whether may lead to potential misinterpretations.

This process of stripping off affixes is called stemming and is, these days, not only used in linguistics, but also in IT-related applications, such as search engines or information retrieval programs, in order to derive a root from a complex word form, thereby keeping track of related information without the need to list each word form (or lemma) of a particular paradigm.


suffix stemmer

The form below allows you to test the effect of stemming for the individual words listed in the dropdown boxes. Simply select a particular word and click on the stem word button. The input word will then be split for you and the two parts appear in the text box on the right, separated by a + symbol.

+
One thing you should minimally have observed when looking at the results of the suffix stemming operations above is that the result for the stem may occasionally either not end in a vowel letter (generally <e> missing) or the vowel may be ‘wrong’ (as in {happi}). The absence of the <e> is basically explicable in two ways:
  1. by attaching the {ed} for ED-forms, we would otherwise duplicate the <e>, which would then, following standard pronunciation conventions, have to be pronounced /iː/, and is thus to be avoided,
  2. due to the loss of the final <e> in Middle English, this letter has become ‘phonetically irrelevant’ and thus no longer influences the pronunciation. This also works better as an explanation for ing-forms.

The change in the vowel letter for final <i>/<y> can easily be explained if we remember that final <y> is generally pronounced /i/ or /ɪ/ (with only a few exceptions, such as in mono-syllabic words ending in it), as well as the fact that, historically, the two letters were originally seen as interchangeable.


prefix stemmer


Reducing a word form too far is referred to as overstemming. This often happens when combinations of letters that look like suffixes are stripped off when the root form has already been reached, or when a change in the combination of letters in the word formation process has been missed by the rules. If you want to try out stemming for other words/suffixes not catered for by the two example programs above, you can test the Porter stemmer, one of the best known and widely used stemming algorithms, mainly used in internet applications, on the Morphology page of my Introduction to Linguistics course.


A mentioned before, the counterpart to reception is production. Production in affixation is characterised by the ability of certain affixes to attach to bases. As you probably already know, not all pre- or suffixes can simply be attached to any base. Let’s test this by trying out different combinations of pre-/suffixes and bases in the form below. If the resulting word form is correct, it will appear in green in the text box on the right, otherwise in red.

Different affixes exhibit different characteristics in terms of how frequently they are involved in word formation processes. Some affixes, such as the negation morphemes, are more productive than others.


Zero-derivation

In all the above examples, we’ve created new words or word forms by adding either a pre-, or a suffix, or even both, sometimes also adjusting the stem in order to make things fit together better. However, as we’ve already seen before, sometimes we encounter word forms that can function in different ways and it’s not easy to determine which word class exactly they belong to unless we investigate their morpho-syntactic behaviour. The use of a word form in a word class it ‘doesn’t belong’ to has commonly been assumed to be possible due to a process called zero-derivation or conversion, where a word changes its word class through the addition of a {∅}-suffix.

In many cases, it is the verb that is assumed to be the original form, which then generally ‘turns into’ a noun, as in e.g. I like to run marathons. vs. Let’s go for a run!, I need to think about it. vs. I need to have a (quick) think about it.. At other times, it’s the other way round, for example in There’s a (big) ship in the harbour vs. We need to ship some goods by tomorrow.. Such instances are quite frequent in English, where often verbal word forms can be used as nouns and vice versa.

In rarer cases, we can also have conversion applying to other word classes, such as from adjective to noun, e.g. The poor man had no proper clothes to wear. vs. They gave some money to the poor., or preposition/particle to verb, e.g. Let’s go down to the river! vs. He can down a large beer in only a few seconds.

The decision about which form should be seen as the original one usually seems to be made on the basis of which one of the two forms is more common, but this is really impossible to say without full knowledge of the historical development of the word – and perhaps also unnecessary. What’s far more important to understand in this context is that many words or word forms in English allow us to use them with different morpho-syntactic functions and meanings, although they frequently share some core meaning – in other words, they are simply grammatically polysemous. The notion of core meaning, however, is something that may be difficult to define, as can be seen quite clearly in the form <ship> above, where the noun refers to ‘a vessel used to transport items’, and the verb the ‘act of transportation’ itself, although, these days, this act is frequently no longer performed by actually using this type of vessel. On the other hand, though, due to this very fact, we can actually make certain assumptions about the etymology of the verb, and that the noun may have been used first.

How frequent words with 2 or more meanings in fact are in English can be seen from the table below, where the number of word classes is based on an analysis of word-class ambiguity in the 1 million-word BROWN Corpus.

Ambiguity in word classes, based on DeRose (1988)
Word ClassesFrequency
23,760
3264
461
512
62
71 (still)

As this is only a fairly small corpus, though, this table may not really tell us the whole story, though, and we can assume that the overall number of grammatically polysemous words may be even higher. Furthermore, if we take into acount that words can also take on a different function without changing their word class, as in examples of compounding we’ve encountered before, and will discuss in more detail in the next section, then this multi-functionality of word forms in English becomes even more striking.


Compounding

Compounding allows us to create new, and more complex or precise words by combining word forms with lexical meaning. Often, these word forms constitute free morphemes, but, as we’ve seen before, they can also be bound morphemes (e.g. cranberry ones), or other forms that are originally derived from inflected verb forms (e.g. dancing girl or scented oil). We can basically distinguish between three different types of compounds, based on their semantic properties:

Endocentric compounds can be sub-divided into two different categories, those where the first element specifies a particular quality, instrument or ‘producer’ of the second element (e.g. olive oil = ‘oil made from olives’), or those where the first element specifies a particular usage or ‘recipient’ or purpose (e.g. baby oil = ‘oil made for babies’). Further examples are blackbird (noun+noun), spoonfeed (noun+verb) or nationwide (noun+adjective/adverb).


  1. Analyse the compounds listed in the box below in terms of their composition, i.e. according to original and resulting word classes, as well as type or sub-type, as applicable.
  2. Also try to explain their exact meaning & how they may have come into existence.
  3. As before, if you don’t know a particular word, look it up.
  4. Save your results.

Backformation

Backformation is a process in which an existing, or sometimes only presumed, suffix is ‘removed’ in order to change the word class. It thus involves a (presumed) shortening of an original word, which may itself have been the process of an earlier shortening word-formation process. Examples for the removal of genuine existing suffixes would be (to) babysitbabysitter (‘someone who sits and keeps an eye on the baby’), (Am. En.) (to) housekeephousekeeper (‘someone who keeps the house in order’), both originally derived from complex noun phrases, or mass-producemass production, derived from a compound. Instances that involve the deletion of presumed suffixes, because the endings are seen as analogous to existing suffixes, such as the -{er} or -{or} suffix that creates agentive nouns, are (to) editeditor, (to) word processword processor, (to) begbeggar.

Although backformation is a still a productive process – as we can see in the relatively recent coinage of (to) word process –, it’s important to understand that not all such ‘coinages’ may be generally acceptable, so it may be best to check and see whether an assumed backformation actually exists before using it, especially in more formal writing, and that, in some cases, we have the same problem as with zero-derivations, in that it may be difficult to see which form may have existed first.

Clipping & Blending

The two processes of clipping and blending are very similar word-formation processes. Clipping is achieved by reducing a polysyllabic word to a monosyllabic one, as in labla.bo.ra.tory/la.bo.ra.to.ry (1←4 or 5, depending on pronunciation), gymgym.na.si.um (1←4) or fluin.flu.en.za (1←4). It thus only affects single word forms.

Blending, on the other hand, is similar to compounding in the sense that parts of two existing word forms are combined. Both of the original word forms undergo a reduction process similar to clipping, only that the remaining parts may be significantly shorter than whole syllables (e.g. brunchbreakfast + lunch, sometimes as short as a single letter that is not equal to a syllable (e.g. in blogweb + log). Some parts may also be shared by both words, and it then becomes impossible to say which word form contributed the element (e.g. smogsmoke + fog).

Acronym Formation

Acronyms are words that are formed by taking individual letters – usually, but not always the initials – from multiple words and combining them into a new word, generally in all upper-case (capital) letters. Examples for commonly known acronyms are: NASA (National Aeronautics & Space Administration), NATO (North Atlantic Treaty Organisation), UNO (United Nations Organization), WHO (World Health Organization), IMF (International Monetary Fund), HTML (Hypertext Markup Language). Many acronyms can also be pronounced as one word, whereas others, such as WHO, IMF, and HTML above, need to be spelled out.

The acronym generator below allows you to create your own acronyms, based on the initials of a sequence of words you choose. Just type the sequence into the textbox on the left and click on the button to create the acronym.


Sources & Further Reading:

Adams, V. (1993). An Introduction to Modern English Word-formation. London: Longman.

Bauer, L. (1983). English Word-formation. Cambridge: CUP.

DeRose, S. (1988). Grammatical Category Disambiguation by Statistical Optimization. Computational Linguistics 14 (1), pp. 31- 39.

Payne, T. (1997). Describing Morphosyntax: a Guide for Field Linguists. Cambridge: CUP.

Plag, I. (2003). Word-Formation in English. Cambridge: CUP.