Understanding File Formats & Their Properties

The table below lists some of the most common file formats containing text that may be of interest to us here, together with common extensions and properties. For most of these types, examples are also provided of what these formats would look like when viewed as plain text outside of the programs that are normally used to generate or read them.

FormatExtensionProperties
plain text .txt or no extension no formatting, just text, possibly with line breaks
web formats .htm(l) (HTML), .xml (XML) plain text with additional information contained in tags; rendering not based on page layout, but on instructions contained in the tags; to view the source text for this document, press Ctrl+u on the keyboard in Mozilla or click on ‘View→Source’ in IE
graphical/page description formats .pdf (Portable Document Format; viewed as text; viewed as PDF), .ps(.gz) (Postscript) the position of all elements on a page is described in a specific format that can be rendered by an appropriate reader
text export is possible, but may contain additional line breaks, based on the page layout
proprietary formats .doc (prior to Office 2007) or .docx (MS Word; viewed as text; viewed as Word), .wpd (WordPerfect), etc. usually stored in a specific binary format; export through optional filters

Go through each of the examples of different types of ‘text’ documents above and see how easy/difficult it is to identify where the actual text is.