The table below lists some of the most common file formats containing text that may be of interest to us here, together with common extensions and properties. For most of these types, examples are also provided of what these formats would look like when viewed as plain text outside of the programs that are normally used to generate or read them.
Format | Extension | Properties |
---|---|---|
plain text | .txt or no extension | no formatting, just text, possibly with line breaks |
web formats | .htm(l) (HTML), .xml (XML) | plain text with additional information contained in tags; rendering not based on page layout, but on instructions contained in the tags; to view the source text for this document, press Ctrl+u on the keyboard in Mozilla or click on ‘View→Source’ in IE |
graphical/page description formats | .pdf (Portable Document Format; viewed as text; viewed as PDF), .ps(.gz) (Postscript) | the position of all elements on a page is described in a specific format that can be rendered by an appropriate reader text export is possible, but may contain additional line breaks, based on the page layout |
proprietary formats | .doc (prior to Office 2007) or .docx (MS Word; viewed as text; viewed as Word), .wpd (WordPerfect), etc. | usually stored in a specific binary format; export through optional filters |
Go through each of the examples of different types of ‘text’ documents above and see how easy/difficult it is to identify where the actual text is.