PDF Generate Pages and Text

SELECTED ANSWER

replied on October 31, 2025

Ok, if it's an image-only PDF you do have to generate pages first, then OCR those generated pages to get text. The repository web client will handle extracting text from a PDF containing text directly, but it needs help to do full OCR text extraction from an image page. Since you are on a self-hosted system, text generation for OCR on pages through the repository web client occurs through DCC, a separately installed component also included in the Laserfiche package.

Have you confirmed that DCC (also known as Distributed Computing Cluster) is up and running and connected to your repository web client application server, and checked the DCC service for any issues listed there? The search service itself doesn't matter if you aren't getting text generated. This is to account for the fact that OCR text generation (from image pages) is a very processor-intensive process and could interfere with normal operations if performed on the application server directly. Your SP is correct that the locally installed windows client bypasses this because it can do the OCR text generation directly locally - it's not mandatory, it just doesn't require connecting to the DCC service to offload the operation.

1 0

replied on September 22, 2025

Hi Vikki,

It's not generally necessary to do BOTH generate pages and generate search text separately. Generally the process of generating pages will also add the text to the search index - generating text is usually a process you can do separate for documents that already have images. Also, when used through the repository web client, generating text is not a simultaneous process - it's either offloaded to DCC (for self-hosted) or the Cloud text generation service (for Laserfiche Cloud), and there may be a delay before the text is returned.

Lastly, you are in the file view (viewing as a pdf) and the search you are using is the embedded adobe reader search, so none of the above options will actually impact that. You can tell this because that's an adobe reader toolbar, not the Laserfiche document viewer toolbar. It will only be able to search on text if the PDF has an embedded text stream, which is a property of the initial PDF itself, not generating text through Laserfiche. Having generated pages you want to toggle to the page view and then Laserfiche search will be used for search operations.

1 0

View 6 previous replies

replied on September 22, 2025 • Show version history

Thank you for your reply, Justin. I can't get the search to work when the file view is toggled on or off.

0 0

replied on September 23, 2025

I just wanted to mention that we have verified that the full text search service is running on our application server and we also restarted it, just in case. I have "generated pages" on several other PDFs in the repo this morning with the same result.

0 0

replied on September 23, 2025

Does it work if you do a text search from the repository as a whole? One interesting thing is that the word IS getting highlighted, which means text has been generated there and affiliated with the text for it to find it - it's just not showing from the doc viewer search.

0 0

replied on September 23, 2025

It does not find it, I highlighted the search term and where it exists just to show that. Unfortunately it does not find it from a document text search in the repo as a whole either.

1 0

replied on September 23, 2025

Ok, sounds like that's something you'd want to open up a support case on at this point, to dig into why it's not making it into your search index.

1 0

replied on September 23, 2025

Thank you!

0 0

replied on October 31, 2025

@████████ The answer from Laserfiche to our solution provider was: "To get words out of a document, the process extracts text from it. Since it is an image only PDF, they will have to generate LF pages first then OCR. The simplest way to do this for PDFs such as this is to use a locally installed Windows client."

Our users only use the Web Client and we have been generating pages and then generating text. I am confused, as it seems this is what they are saying we should do, but when we do, there is still no searchable text. Do you possibly have any other ideas on this issue.

0 0

SELECTED ANSWER

replied on October 31, 2025

1 0

replied on November 3, 2025

This is very helpful. I know DCC was set up in the past but something related here must be the issue. We will check that. Thank you!

0 0

Question

Question

Answer

Replies

Sign in to reply to this post.