You are viewing limited content. For full access, please sign in.

Question

Question

LF Cloud OCR PDFs without text layer

asked on August 4, 2022

We have a customer that is importing PDFs that have no text layer and they are not getting OCRed.  Is there an option to designate using an alternative method so that the resulting images get OCRed?  They do not want to have to install and use the desktop client.

0 0

Replies

replied on August 4, 2022

Hi Bert,

Do they have both "Generate Text" and "Generate Pages" selected on import?

I believe PDFs without a text layer require page generation for OCR.

0 0
replied on August 4, 2022

Yes they have both selected and the Pages are generated (PDF is not kept), but the text is never generated.

0 0
replied on August 4, 2022

I discussed with the team and we request that you file a support case that includes several sample PDFs.

Based on the descriptions thus far, we're unsure if this is a bug, working as expected but suboptimal design, or some kind of edge case potentially involving the customer's specific PDFs, and would like to investigate further.

While you specifically mentioned they don't want to have to use the desktop Laserfiche Client, if there is an immediate need to have the text available, we are fairly confident that regenerating the document text in the client would work. The desktop client generates the text locally and sends it to the server so unless something goes wrong with the local OCR process you're guaranteed to get it.

In contrast, text generation/OCR in LF Cloud is a background asynchronous process where documents get sent to a queue to be processed by a pool of OCR workers, conceptually similar to Laserfiche Distributed Computing Cluster (DCC) for self-hosted systems.

1 0
replied on August 5, 2022

Burt, can you check again now? It may have been OCRed overnight by the asynchronous process that Sam mentioned. 

1 0
replied on June 23, 2023

In contrast, text generation/OCR in LF Cloud is a background asynchronous process where documents get sent to a queue to be processed by a pool of OCR workers, conceptually similar to Laserfiche Distributed Computing Cluster (DCC)

 

This is awesome!

1 0
You are not allowed to follow up in this post.

Sign in to reply to this post.