Search
Close this search box.

Get data from PDF

This action will extract important data from PDF files in the form of Key/Value pairs.

Input Parameters

Required Parameters

File Content:
Data Type = string (byte – base 64 string)
The content of the source file

Optional Parameters

Expected Keys:
Data Type = string (One key per line)
Provide one key name per line to make values available to later actions without parsing JSON.

Page Limit:
Data Type = integer
Maximum number of pages to be processed

Page Range:
Data Type = string
A string representation of the page numbers you want to process. E.g. 1,3-4

Dates as ISO:
Data Type = boolean
Set this to true if you want the date values to be returned as an ISO Date

Confidence Score:
Data Type = number
Set a higher confidence score to filter out values with lower confidence. The value range is between 0 and 1, and the recommended value for meaningful key/value pairs is 0.5 and above

Strip Currency Symbols:
Data Type = boolean
Set this to true if you want the symbols and strings to be removed before we return currency values

Match Synonym:
Data Type = boolean
Set this to true if you want us to return all the keys that are synonyms to the expected key

Synonym Dictionary:
Data Type = string
You can provide a JSON array of “entry” objects, where each object contains a list of synonyms in an array. For instance, if you want “Invoice No” and “Invoice Number” (case-insensitive) to be interpreted as the same key, use the following JSON: [{‘entry’: [ ‘Invoice No’, ‘invoice number’ ]}]

Trim Symbols:
Data Type = boolean
Set this to true if you want us to remove all leading and trailing symbols from the keys found before we match them to an expected key.

Output Parameters

Expected Keys: Data Type = string Each expected key that was added to the request will have a corresponding string property in the response. License Info: Data Type = string Information about your API subscription key, it contains:
  • LicenseType
  • CallsRemaining
  • CallsMade
  • RenewalDate

Error message: Data Type = string Error message Is Successful: Data Type = boolean This will be true if data was extracted from the file. Pages: Data Type = object[] A list of pages containing Key/Value pairs.

Page Number:
Data Type = integer
The page number of the current page

Page Height:
Data Type = integer
The height of the current page

Page Width:
Data Type = integer
The width of the current page

Page Key/Value Pairs:
Data Type = object[]
A list of Key/Value pairs extracted from a page.

Key:
Data Type = object
Object representing a Key

Key Text:
Data Type = string
String showing the content of the key

Bounding Box:
Data Type = object
A rectangle representing the position of the key on a page.

Top:
Data Type = number
The top coordinate of the bounding box

Left:
Data Type = number
The left coordinate of the bounding box

Height:
Data Type = number
The height of the bounding box

Width:
Data Type = number
The width of the bounding box

Values
Data Type = object[]
Object representing a Value

Value Text:
Data Type = string
String showing the content of the value

Confidence:
Data Type = number
A score of how confident we are that this value matches with the key.

Bounding Box:
Data Type = object
A rectangle representing the position of the Value Text on the page.

Top:
Data Type = number
The top coordinate of the bounding box

Left:
Data Type = number
The left coordinate of the bounding box

Height:
Data Type = number
The height of the bounding box

Width:
Data Type = number
The width of the bounding box