You are viewing limited content. For full access, please sign in.

Question

Question

Laserfiche Won't OCR new PDF Documents

asked on April 8, 2016

We are importing scanned PDF documents into Laserfiche and we have it set to OCR during the import process. We get the progress bars like it is going through the motions, but then we don't get any index or OCR information.

 

If we do an OCR in Acrobat outside of Laserfiche and then import, it will perform the full OCR during the import process like normal, but it still isn't indexed even though it says it is. Is this an issue with the OCR engine built into Laserfiche?

1 0

Replies

replied on April 8, 2016

Text extraction for PDFs is handled differently from other file formats. If the PDF doesn't contain a text stream, you need to configure the client to generate image pages and OCR those images:

1 0
replied on April 8, 2016

Thanks. We tried that setting and now it at least gives us some information during the import, but it still won't actually process the OCR. It says no pages were OCR'd and the document wasn't indexed. The documents are old typed documents from the 60's with very clear text on them so they should be able to be recognized through the OCR.

0 0
replied on April 8, 2016

Try generating pages manually after the PDFs have been imported (Tasks->Generate Pages), then generate text (Tasks->Generate Text) to OCR the pages.

0 0
replied on April 8, 2016

I was able to generate the pages, but was unable to generate text from those pages.

0 0
replied on April 8, 2016

It sounds like the OCR engine might be failing to handle the images. Is there any error reported when you generate text? As an experiment, make a copy of the document, delete the edoc from the new document(Tasks->Delete Electronic Files), then try generating text. If text is not generated, it is an OCR issue. Otherwise, it might be a bug in the LF client.

0 0
replied on April 8, 2016

There are no errors generated. Everything looks like it processes correctly and normally. I did try removing the electronic files and re-generating, but still no dice so it looks like at least for now we will need to continue to conduct the OCR within Acrobat before importing into Laserfiche.

 

I still don't understand though why the text from the OCR isn't searchable even though it says it was indexed.

0 0
replied on April 8, 2016

I think at this point the best option is to open a support case so that we can troubleshoot the issue.

0 0
replied on April 12, 2016

Thanks. Where can I do that? I don't see anywhere on the support page to submit a support case.

0 0
replied on April 12, 2016

You need to go through your VAR.

0 0
replied on April 10, 2016

Dear Robert,

 

I used to face this issue too in Laserfiche Client 9.2. I used to rectify using the snapshot to Generate Laserfiche Pages, instead of Extract images from PDFs.

 

 

Regards

Kirubaa

0 0
You are not allowed to follow up in this post.

Sign in to reply to this post.