Tiff Vs PDF

An Overview of Their Merits

The choice of storage format for electronic documents can have significant and far-reaching consequences. This short white paper provides an overview of the TIFF and PDF document formats and discusses their relative merits as a format for electronic document storage.

PDF Background

PDF File Format

A PDF file encapsulates a complete description of a fixed-layout document that includes the text, fonts, raster images, and vector graphics which the document comprises. It includes support for JPEG, JPEG 2000, JBIG2, Group 3 and Group 4 images.

Types of PDF

Normal

This is the most common type of PDF and is most typically created from a document such as Microsoft Word. It contains the full text of the page with appropriate coding to define fonts, sizes, etc. and will provide a faithful print of the original.

Image Only

This is a PDF that has been created from one or more images – most commonly as a result of scanning a document either directly to PDF or by converting a scanned TIFF image to PDF. These files do not contain any searchable text and most often comprise a set of Group4 or JBIG2 images in a PDF “wrapper”.

Searchable

A “Searchable” PDF is an “Image-Only” PDF that additionally contains a hidden layer of text generated by an OCR engine. This enables the file to be searched in the same fashion as a “ Normal” PDF. Text can be copied and pasted.

TIFF Format Background

The TIFF format was created by Aldus Corporation in the mid-1980s in order to create a standard file format for storage of scanned images. The TIFF specification is now controlled by Adobe although no major update to the specification has taken place since 1992.

TIFF Format

It is important to understand that the TIFF format itself is a file format, not an image format. A TIFF file can be thought of as a container for one or more images each of which may be of a different type. The most common types of image included in TIFF files are shown in the table below.

Compression Type Typical Usage
Group4 Most commonly used for black and white (“bitonal”) scanned images.
Group3 Used for Faxes
JPEG Used for Grayscale and Color scanned documents. There are two definitions of JPEG in TIFF, type 6 and type 7 and there have been interpretation issues with type 6 in particular. Consequently these images are not always well supported.
LZW Used for Grayscale and Color scanned documents. Due to historical patent issues this is not always well supported, although as the patents expired in 2004 more recent software should provide good support.
Uncompressed Often output from graphics applications.
Others Other less commonly used schemes include ZIP, Packbits, RLE.

TIFF Metadata Tags

Metadata may be stored in TIFF files by the use of tag fields. There is a set of baseline TIFF tags which should be supported by all TIFF software. The baseline tags include Scanner, Make, DateTime, Software etc. There is also a set of defined extension tags. Furthermore developers can implement “Private” tags.