Could not find file ‘C:\Windows\TEMP\AquaForestOCR\nnnn_nnn\n_n.hocr’

Could not find file ‘C:\Windows\TEMP\AquaForestOCR\nnnn_nnn\n_n.hocr’

When using the Aquaforest OCR SDK, intermittently you may receive the following message in your application:

System.IO.FileNotFoundException was caught
FileName=C:\WINDOWS\TEMP\AquaforestOcr\xxxx_xx\x_x.hocr
Message=Could not find file ‘C:\WINDOWS\TEMP\AquaforestOcr\xxxx_xx\x_x.hocr’.

This message is generated as a direct result of the source file not being OCR’d, however the particular message is not appropriate in this case.  In order to resolve this issue you need to subscribe to the StatusUpdate which will allow you to use StatusUpdateEventArgs.  This class is available for each page processed when subscribing to the StatusUpdate event and provides information relating to the processing outcome for the page.

Below are the properties of this class.

  • int PageNumber This property returns page for which the object relates to.
  • int Rotation A value from 0 to 3 which indicates the rotation used for the output in terms of the number of 90° steps away from the orientation in which the input page was provided. If AutoRotation is set to false this will always be 0.
  • double ConfidenceScore Generally a value of 1 or greater would indicate that reasonable OCR of a page, but this should be confirmed using “typical” source files.
  • bool TextAvailable This property indicates whether text was extracted for the page.
  • bool ImageAvailable This property indicates whether an image (after all appropriate pre-processing) was successfully extracted.
  • bool BlankPage This property indicates whether the page was detected as blank.

Below is an example in C# where the above class has been used (higlighted in red) to overcome this issue:

 

class Program

{

static bool textAvailable = false;

static void Main(string[] args)

{
try
{
Ocr _ocr = new
Ocr();
_ocr.License = “”;
PreProcessor _preProcessor = new PreProcessor();
_ocr.EnableConsoleOutput = true;
string OCRFiles = System.IO.Path.GetFullPath(@”..\..\..\..\..\..\bin”);
System.Environment.SetEnvironmentVariable(“PATH”, System.Environment.GetEnvironmentVariable(“PATH”) + “;”
+ OCRFiles);
_ocr.ResourceFolder = OCRFiles;
_preProcessor.Deskew = true;
_preProcessor.Autorotate = false;
_ocr.Language = SupportedLanguages.English;
_ocr.EnablePdfOutput = true;
_ocr.StatusUpdate += OcrStatusUpdate;
_ocr.ReadTIFFSource(System.IO.Path.GetFullPath(@”..\..\..\..\..\..\docs\tiffs\sample.tif”));
if (_ocr.Recognize(_preProcessor))
{
string words = null;
for (int j = 1;
j <= _ocr.NumberPages; j++)
{
try
{
if (textAvailable)
words += _ocr.ReadPageString(j);

}
catch (Exception
ex)
{
Console.WriteLine(“ERROR”);
}
}
_ocr.SavePDFOutput(System.IO.Path.GetFullPath(@”..\..\..\..\..\..\docs\tiffs\sample.pdf”),
true);
}
_ocr.DeleteTemporaryFiles();
}
catch (Exception
e)
{
Console.WriteLine(“Error
in OCR Processing :” + e.Message);
}

}

 

        private static void OcrStatusUpdate(object sender,
StatusUpdateEventArgs statusUpdateEventArgs)

       
{

           
textAvailable = statusUpdateEventArgs.TextAvailable;

}
}

 

 

 

The following two tabs change content below.
mm
Neil Pitman founded Aquaforest Limited in 2001 and is the chief architect for the company’s PDF and OCR software products used by thousands of organizations ranging from NASA to the Dutch Ministerie van Justitie. Neil has 30 years’ experience in the software industry in the UK and USA in the areas of database systems, document management and software development tools and has served on the IDT committees of the British Standards Institute (BSI) and was a co-author of the BSI’s 2007 publication on the Long Term Preservation of Digital Documents.