Webinar | Solving the Sunset of InfoPathWebinar | Solving the Sunset of InfoPath Register
Close this search box.
Aquaforest SDK

OCR and Data Extraction SDK

A robust SDK for C# and VB applications to extract data from PDFs. Leverage additional SDK capabilities like handwriting OCR, text extraction, document compression, barcode scanning, and more.
Play Video


Aquaforest SDK is a powerful toolset for processing PDFs including:

  • PDF content extraction
  • Searchable PDF Creation
  • OCR with Standard (Aquaforest) Engine
  • OCR with Extended (Canon IRIS) Engine
  • Handwriting OCR options via Google & Microsoft APIs
  • Advanced PDF and Barcode Toolkit
  • High Performance with Support for up to 64 Cores

Main Features

PDF Data Extraction

The SDK is able to analyse PDF documents and automatically extract name/value pairs.

PDF Tools

The SDK has a wide variety of PDF manipulation capabilities including PDF merging, PDF attachment processing, PDF content extraction, XMP metadata processing, PDF/A validation and more.

Standard OCR

The Standard OCR Engine supports 23 languages (see the full list) and is included in every edition of the SDK.

Extended OCR

The Extended OCR Engine supports over 100 languages (see the full list) and is included in the Extended Edition licenses.

Cloud OCR

This provides an interface to Google and Microsoft’s cloud OCR services which can be especially useful for special cases such as handwriting recognition.


The SDK is able to read and recognize most standard barcode types.

Get a Quote

Please contact the sales team for pricing information.


License Comparison Table

Edition Comparison Standard Extended
PDF Toolkit
Data Extraction from PDF documents without the need for templates or prior training
Barcode Decoding
OCR from bitmap, TIFF and PDF
Microsoft Cloud OCR (requires additional Microsoft Subscription)
Google Cloud OCR (requires additional Google Subscription)
Image Pre-Processing and Auto-Rotation
.NET Programmatic and Zonal access to OCR results
RTF and TXT output
Blank Page Removal
PDF Merging
Searchable PDF Output
Stamps on PDF Output
Advanced MRC and JBIG2 Compressed PDF Output
Advanced Pre-processing (Optimized OCR)
Aquaforest OCR Support for 23 languages
Extended IRIS OCR with Support for 131 languages
Support for multiple languages within a single document from the same character set
Multiple document output formats:
Multiple PDF version output support
Confidence score support
Asian Language Support
Arabic Language Support
Hebrew Language Support
Intelligent High Quality Compression


The Standard bundle includes support for 23 languages.

The Extended bundle includes support for 131 languages.

The Extended bundle language list includes Chinese (Traditional and Simplified), Japanese, Korean, Thai and Vietnamese.

We can demonstrate the product for you and discuss how it can meet your needs.

Our team have gained extensive experience and expertise in searchable PDFs over many years and are members of the PDF association. We are happy to share our knowledge and provide free advice in this area.

We aim to respond to email support requests within 1/2 a business day- usually we respond much more quickly than that. Email support@aquaforest.com with any support query.

Phone support
If you prefer to speak directly with our team call us on +44 (0)1296 768 727 or ask for a call via support@aquaforest.com with any support query.

Live chat
You can always contact us on live chat during office hours.

Tech Spec

Searchable PDFs

Aquaforest’s OCR engine, capable of processing thousands of pages per hour, is used to recognise text from source TIFF and Image-Only PDF files and to create Searchable PDF files.

PDF Data Extraction

The Aquaforest Data Extractor allows data extraction from PDF documents without the need for templates or prior training. The software is able to read the PDF text and extract important key-value pairs automatically, making processing of files with various layouts easy.

Image Preprocessing

For optimal OCR recognition, options are available to control deskew, despeckle, graphics area treatment and auto-rotate.

Simple .NET integration

The SDK has been designed to be simple to integrate with .NET applications and complete samples are provided in C#, VB.NET and ASP.NET.

Fully Searchable PDF Generation

The SDK can be used to generate fully text searchable PDFs with the original image and a transparent text layer.

System requirements

Supported Operating Systems Windows 10
Windows Server 2012 R2
Windows Server 2016
Windows Server 2019
Minimum Memory Single Core License - 4 GB RAM
Recommended Memory Single Core License - 8 GB RAM
8 Core License - 16 GB RAM
Greater Than 8 Core License - Ask support@aquaforest.com
Recommended CPU Single Core License - i5 processor
8 Core License - i7 processor
Greater Than 8 Core License - Ask support@aquaforest.com
Disk Space Clean Install: 1.31GB
All Samples Compiled: 4.75GB
.NET Framework 4.7.2
Visual C++ Runtime The Visual C++ Redistributable package is required for deployment as well as development.
The Aquaforest engine requires Visual C++ 2017 Redistributable (x86 | x64)
Autobahn DX

Start using Autobahn DX today and convert your archives to fully text searchable PDF today.