You are viewing limited content. For full access, please sign in.

Question

Question

pdf and Quickfields

asked on September 11, 2014

So I have a PDF file (174 pages) that I'm looking to run through Quickfields.  I've setup a ZoneOCR for first page identification to identify a new document when the word "Payment" occurs in a certain part of the page.  It's a digitally created PDF (not scanned), so there is text.  My Universal Capture settings are as follows:

-Keep each file as a separate document

-Generate an Image for each page

-Convert Image to B&W

-Scale image to user 300dpi

-Extract the text from each page

 

However, when I run the PDF through Quickfields, it only identifies it as 1 single document, and the ZoneOCR info is not accurate.  However, when I test the current process for Zone OCR...it captures the information, and my PM is working when I test that same info.  I have changed my Zone field properties to "Use Existing Text".  Is there something I'm missing as to why this isn't working at all?

0 0

Answer

SELECTED ANSWER
replied on September 12, 2014

Based on the session attached to the support case, this issue was caused by the Zone OCR process in Last Page Identification. Quick Fields was incorrectly waiting for it to indicate when the document is complete even though the document class was set to not use Last Page Identification. Deleting or disabling this process will allow your identification to work as expected. A bug report has been logged for this issue.

0 0

Replies

replied on September 11, 2014

I don't think that "Use Existing Text" works unless the text has already been extracted. On a PDF that has text embedded, you want to use the "Extract Text" activity to pull out the text so that it can be used. At that point, Zone OCR will have something to work with.

1 0
replied on September 11, 2014

So it's capturing the information now, but it's still not separating the pages.  All 174 pages come in as one document even though I have my Page Identification Setup.

0 0
replied on September 11, 2014

What conditions do you have in Page Identification?

0 0
replied on September 11, 2014

If any of the conditions are true

1) Identification Condition (Zone OCR 1) contains Payment

 

Here are the results from my test:

 

Information    1    Payment Amount Identification : Bank:121000248
Account: 4121612618(NJ)
Payment Amount:
Originator
Entry Class:
Oriqinator Companv Name:    Payment Amount Identification        0    

 

However, as you can see from below my Zone field should't be capturing nearly all of this data...it seems that the PDF text is doing something?

 

0 0
replied on September 11, 2014

Sounds like the resolution of the sample image might not be the same as the images generated from the PDF. This is probably better suited for a support case where you can attach sample images and a session.

0 0
replied on September 11, 2014

When I change my identification criteria to just create new document every 1 page...it works...but as soon as take that away and use my zone ocr  condition...it creates just one single document.

0 0
SELECTED ANSWER
replied on September 12, 2014

Based on the session attached to the support case, this issue was caused by the Zone OCR process in Last Page Identification. Quick Fields was incorrectly waiting for it to indicate when the document is complete even though the document class was set to not use Last Page Identification. Deleting or disabling this process will allow your identification to work as expected. A bug report has been logged for this issue.

0 0
replied on September 15, 2014

Miruna,

So that above fixed my issue with identifying the documents, however, when I go to store the documents...I'm seeing something else odd.  I'm using Zone OCR to capture the date, then I'm putting that date into a Date field in the template.  I'm then using that Date field to store the documents.  However, I'm getting 2 different date folders created.

 

When I look at the document metadata (template), it shows the date as 9/10/2014, but it created the folder (and named the document) 09/10/2014. 

 

The weird part is that it stored 133 documents correctly, but then these 4 documents are stored here...not sure why??

0 0
You are not allowed to follow up in this post.

Sign in to reply to this post.