You are viewing limited content. For full access, please sign in.

Question

Question

Can Full-text searched be done inside of a PDF?

asked on November 18, 2020

I deal with tiff images and was asked if a full text search and be done on a pdf document and I was not sure how to answer.

0 0

Answer

SELECTED ANSWER
replied on November 19, 2020

This approach will be most consistent.

If a PDF has a text layer, Laserfiche can index it for Full-Text Search without needing to generate pages first. Same with Microsoft Office documents.

PDFs often don't have a text layer though, and scanned ones (vs digitally created) almost never do. 

1 0

Replies

replied on November 18, 2020

A belt and suspenders approach is to always generate pages on Electronic Files, OCR and index the pages. That way if the pdf's contained imaged text, the full text search will hit it.

0 0
replied on November 18, 2020

We take this approach with most documents. Storing both a PDF and OCR'ed pages gives you the best of both worlds, but it should be noted there is a significant increase in storage size when you generate pages so it does come at a price.

1 0
SELECTED ANSWER
replied on November 19, 2020

This approach will be most consistent.

If a PDF has a text layer, Laserfiche can index it for Full-Text Search without needing to generate pages first. Same with Microsoft Office documents.

PDFs often don't have a text layer though, and scanned ones (vs digitally created) almost never do. 

1 0
You are not allowed to follow up in this post.

Sign in to reply to this post.