Aquaforest OCR engine v Extended (IRIS) OCR engine

Aquaforest OCR engine v Extended (IRIS) OCR engine

Aquaforest v IRIS

In this article, we will highlight key features of each OCR engine to help you decide on which OCR engine to use for your project.

 

Language Support:

The Aquaforest OCR supports 23 languages (primarily European languages) see this document for a list of the supported languages.

The Extended (IRIS) OCR engine supports over 127 languages including support for Asian languages, see this document to see the full list of supported languages.

The Extended (IRIS) OCR engine allows specification of multiple languages to enable recognition of several languages in a single document, the languages must be from the same character set.

 

Output File Formats:

The Aquaforest OCR engine can generate the following output file formats:

  • PDF
  • TXT
  • RTF

The Extended (IRIS) OCR engine can generate the following output file formats:

  • PDF
  • TXT
  • RTF
  • DOCX
  • EXCELML
  • HTML
  • CSV
  • XPS
  • XLSX

 

Compression:

The Aquaforest OCR engine includes JBIG2 compression for black and white images and MRC for color images.

The Extended (IRIS) OCR engine has the IHQC Module which is an optional module. It enables the use of IRIS’ new Intelligent High Quality Compression technology for powerful PDF compression without compromising visual quality, text resolution and legibility of documents.

 

 Pre-Processing Options:

These options such as de-skewing images, auto-rotate can be applied to the image to ensure optimal OCR performance.

The Aquaforest OCR engine provides the following pre-processing options:

Auto-rotate:               rotates the image if required.

Line removal:             removes line from the image.

De-skew:                     straightens image.

De-speckle:                removes specks from image.

Binarize:                      whether to perform binarization on color images.

 

The Extended (IRIS) OCR engine provides a more comprehensive set of pre-processing options which are listed below:

De-speckle:                removes specks from image.

Auto-rotate:               rotates the image if required.

Line removal              remove lines from an image (The image must be black and white).

RemoveWhitePixels   By default, de-speckle removes black pixels. If set to true, the de-speckle will remove white pixels rather than black pixels.

Binarization                Whether or not to perform binarization on the document.

Brightness                    The brightness (higher values will darker the result).

Contrast                      The contrast (lower values will darker the result).

SmoothingLevel         Smoothing may be useful to binarize text with a colored background in order to avoid noisy pixels (0 disables smoothing, higher values smooth more).

Threshold                    Sets the threshold for fixed threshold binarization (0 for automatic threshold computation).

HorizontalCleanX      The parameter for cleaning noisy pixels attached to the horizontal lines.

HorizontalCleanY       The parameter for cleaning noisy pixels attached to the horizontal lines.

VerticalCleanX          The parameter for cleaning noisy pixels attached to the vertical lines.

VerticalCleanY          The parameter for cleaning noisy pixels attached to the vertical lines.

HorizontalDilate         The dilate parameter helps the detection of horizontal lines.

VerticalDilate             The dilate parameter that helps the detection of vertical lines.

HorizontalMaxGap    The maximum horizontal line gap to close. It is useful to remove broken lines.

VerticalMaxGap       The maximum vertical line gap to close. It is useful to remove broken lines.

HorizontalMaxThickness        The maximum thickness of the horizontal lines to remove. It is useful to keep vertical lines larger than this parameter. Can be also useful to keep vertical letter strokes.

VerticalMaxThickness             The maximum thickness of the vertical lines to remove. It is useful to keep horizontal lines larger than this parameter. Can be also useful to keep horizontal letter strokes.

HorizontalMinLength             The minimum length of the horizontal lines to remove.

VerticalMinLength                 The minimum length of the vertical lines to remove.

RemoveDarkBorders              Removes the dark surrounding from bitonal, grayscale or color images. The dark surrounding of the image is whitened (Note: The dark border should be touching the edge of the page for this to work).

Interpolation                          Interpolates the source image to the given resolution. This value (the target resolution) must be greater than the source image’s resolution.

InterpolationMode                Sets the interpolation mode.

KeepOriginalImage               Keep the original image as it is.

 

If you have any questions please send an email to the support team who will be happy to assist you with your query.

The following two tabs change content below.
Neil Pitman founded Aquaforest Limited in 2001 and is the chief architect for the company’s PDF and OCR software products used by thousands of organizations ranging from NASA to the Dutch Ministerie van Justitie. Neil has 30 years’ experience in the software industry in the UK and USA in the areas of database systems, document management and software development tools and has served on the IDT committees of the British Standards Institute (BSI) and was a co-author of the BSI’s 2007 publication on the Long Term Preservation of Digital Documents.

Latest posts by Neil Pitman (see all)