You are viewing limited content. For full access, please sign in.

Question

Question

OCR quality

asked on June 20, 2016 Show version history

Hi All,

 

I'm having trouble getting quality OCR'd pages. Even if I am OCRing a PDF (where the text is crisp and clear), the text turns into something like this:

 

It turns out fine if the pages are scanned into Laserfiche as .tif; but for some reason, if it starts as a pdf, it becomes quite ugly and disfigured. My settings are set for highest accuracy. Is there any other options to consider?

0 0

Answer

SELECTED ANSWER
replied on June 20, 2016

Correct, you'd probably need the update. If your VAR is able to submit a sample document to our support group, we can confirm that it addresses the issue you are seeing. 

0 0
replied on June 20, 2016

Thanks, Justin!

0 0

Replies

replied on June 20, 2016

Hi Robyn,

Is there a specific reason that you are using OCR for PDFs in that second screenshot instead of native text extraction? Native text extraction will pull the text directly from the PDF, bypassing the need to OCR it in the first place and therefore removing the possibility of text artifacts. 

Also, to clarify, OCR text just refers to the text stream itself that is used for searching. Is that the issue you are running into, or are you getting bad image pages themselves - that first screenshot looks to be an image, not text. What version of the Client are you using here? There were some issues with generating image pages from PDFs that were recently addressed in an update that just went out last week. 

1 0
replied on June 20, 2016

So I rediscovered why I had changed the OCR settings for PDFs: for some reason, using native text extraction doesn't seem to work. I'm assuming this is the case because the pdfs are scanned rather than created natively. 

0 0
replied on June 20, 2016

These are my settings

Laserfiche Pages Settings.PNG
0 0
replied on June 20, 2016

Hi Justin,

There is no specific reason for that, no: I'll switch it back to native text extraction and see how that effects the quality.

And you're right, it's a picture of the image when pages are generated. I'm using version 9.2.1., so I suppose this won't be corrected until we get the update?

0 0
You are not allowed to follow up in this post.

Sign in to reply to this post.