You are viewing limited content. For full access, please sign in.

Question

Question

Import agent always creates text for PDFs

asked on April 12, 2016

We have a client that has noticed some odd behavior in Import Agent. Even with the "OCR image files" turned off there is still extracted text on the PDFs they import. It seems that import agent uses the text stream of the PDF to create text by default. 

They wanted to use DCC to OCR the pages after the documents are imported. Is there a way to disable text extraction from PDF text streams or is this not a resource intensive process? 

0 0

Answer

SELECTED ANSWER
replied on April 14, 2016 Show version history

Extracting text streams directly from the PDF is not a time or resource intensive process. Furthermore it's going to be an accurate dump of the original text stream, instead of having to recreate the text through OCR. 

If you still want to do OCR itself, you can always run them through DCC once they are imported, but there's no real resource considerations to worry about. 

1 0

Replies

replied on April 14, 2016

https://www.laserfiche.com/ecmblog/tech-tip-pdf-page-generation-in-laserfiche-import-agent-9/

“In addition, Laserfiche will extract the original text stream from the PDF for searchability; if the PDF has no text stream, you can OCR the resulting pages instead.”

So is there no toggle on this?

0 0
You are not allowed to follow up in this post.

Sign in to reply to this post.