You are viewing limited content. For full access, please sign in.

Question

Question

Quick Fields PDF Keep Extracting text

asked on February 3, 2016

Hi

I have been stuck on this problem for a long time now. One of our client wants to process around 500 of 1099R forms saved in one pdf file. What i am trying to do is to read each page, extract account number using zone ocr and create separate 1099r form for each user, filled in metadata with that id. 

In QF text is automatically extracted (i tried all scan options) and all the lines and form shape goes away which i certainly dont want. Is there a way i can achieve this without distorting the actual file. I basically want to save a tiff image of each page along with extracted id in its metadata. 

I tried generating pages for that pdf which contains all forms, but everytime after 100 or so pages it gives me an error "Failed to load image".

pdf is stored locally. i have also applied hotfix but nothing happened. 

Any help is greatly appreciated. 

1 0

Answer

APPROVED ANSWER
replied on February 5, 2016

Hi Junaid,

The issue is that the Zone OCR process is leaking memory when it processes a TIFF JPEG document. We're looking into this, but since the documents you are processing are basically black and white forms, please do the following:

In your PDF Quick Fields session, go into Scan > Configure Scan Source > Document Content and check the option to "convert images to black & white"

In your TIFF Quick Fields session, add a color removal process to the pre-classification processing section.

Regards

1 0

Replies

replied on February 3, 2016

Please open a support case and attach the session and sample document.

1 0
replied on February 3, 2016

all the lines and form shape goes away

 

Does this mean you are using a Form Extraction process in order to extract the text correctly? If so, have you looked into using a Local Image Enhancement so that this processing does not affect the final image?

0 0
replied on February 3, 2016

Even if i disable all processes and just run the QF session, in the Document manager, output file is still just text. 

0 0
replied on February 3, 2016

Ah ok. So you'll need to generate pages. If you generate pages, do you see the image as expected for all of the pages except the ones that have the "Failed to load image" issue?

0 0
replied on February 3, 2016

Yes when i generate pages first in client and run the session on tiff instead of pdf, for first 100 pages everything looks good, account no is extracted but after 100 or so pages error pops up "Failed to load image". i have checked the generated images after 100, they seem fine in windows preview. 

0 0
replied on February 3, 2016

I agree with Miruna: please open a support case. 

1 0
You are not allowed to follow up in this post.

Sign in to reply to this post.