Search
Close this search box.

Extract pages by text

Uses text matches in PDF files to extract pages from the PDF file, you can also generate file names for the split files based on the barcode text matches.

Input Parameters

Required Parameters

File Name:
Data Type = string
The name of the source file, this will be used for the file name template.

File Content:
Data Type = string (byte – base 64 string)
The content of the source file, this should be converted to a base64 string if you are passing it from code, otherwise Power Automate handles this aspect.

File Name Template:
Data Type = string
Template for the output text result if a text match is found, any occurrence of variables in the list below will be replaced by the appropriate value at runtime.

  • %VALUE1%:The text extracted from the first zone that was extracted, if no zone was provided all the text in the page will be returned.
  • %VALUE2%, …, %VALUEn%The text extracted from the nth zone that was extracted.


No File Template:
Data Type = string
Template for the text to be returned if a text match is not found

Optional Parameters

Text Zones:
Data Type = Object []
A collection of variables that can be used to extract text information from PDF files, each member of this collection contains the properties listed below. Each member of this collection should produce a text output that corresponds to %VALUEn% of the Text Result Template discussed above.

Text Location:
Data Type = string
This represents the coordinates of a rectangle that covers the text you want us to extract. You can use this page to get the coordinates in relation to your input files.

Text Page Number:
Data Type = integer
Provide a page number to extract text from, if empty we will try each page until we get a match.

Text Pattern:
Data Type = string
If a regular expression is provided here, we will match any extracted text to it and return the match.

Text Select:
Data Type = string
Use this to refine the text you extract more, select an option that matches your requirements

  • text in zone: This option will select all the text that was extracted.
  • word after value: If this option is selected, this action will return the word that appears immediately after the expression supplied below.
  • word before value: If this option is selected, this action will return the word that appears immediately before the expression supplied below.
  • all text in line after value: If this option is selected, this action will return all the words that appear on the same line after the expression supplied below.
  • all text in line before value: If this option is selected, this action will return all the words that appear on the same line before the expression supplied below.
  • all text in zone after value: If this option is selected, this action will return all the words that appear in the selected zone after the expression supplied below.
  • all text in zone before value: If this option is selected, this action will return all the words that appear in the selected zone before the expression supplied below.

Text Value:
Data Type = string[]
Provide one or more value(s) here to be used with the property above, we will return the first text value that matches the rule stated above.

Output Parameters

Extracted Output Files:
Data Type = object[]
Array of Extracted Files with their corresponding file names.

File Content:
Data Type = string (byte – base 64 string)
A base 64 string representation of the spilt file.

File Name:
Data Type = string
File name for the extracted file above

Page Number:
Data Type = string
The page range containing the page number where the extraction occurred

Is Successful:
Data Type = boolean
A boolean value specifying if the operation was successful or not.

License Info:
Data Type = string
Information about your API subscription key, it contains:

LicenseType
CallsRemaining
CallsMade
RenewalDate

Error:
Data Type = string
Contains the Error message returned by the operation if any exist.