You are viewing limited content. For full access, please sign in.

Question

Question

Import Agent OCR Text from pdf adds spaces

asked on January 17, 2019

When importing documents to the Repository using Import Agent or Email Archive, we are getting spaces in the text that is extracted.  Are there specific settings to check to correct the issue?

If I open the client and generate text on the same document, the text is created perfectly.

Here is an example:

Any help is greatly appreciated.  This happens with both Import Agent and Email Archive.  Both are on the same server.  The LF repository was recently migrated to a new Windows 2016 server.

0 0

Replies

replied on January 17, 2019

Hi Ken,

 

Which version of Import Agent and OCR? Are same configurations for OCR used in Import Agent and Laserfiche Client? Is the same version of OCR used in Import Agent and Laserfiche Client?

 

In Import Agent, the OCR configuration is here: Import Agent Configuration Utility->Profile->OCR

 

In Laserfiche Client, the OCR configuration is here: Tools->Options-Generate Text->General.

 

Thanks,

Qinmei

 

 

0 0
replied on January 22, 2019

Win 2012 R2 Server with Import Agent 10.3.1.479 and OCR 19.2 version 10.2.1

Client machine is Win 10 with LF 10.4 with OCR 19.2 version 10.2.1 and OCR 18.5 version 9.1.0

LF Server has OCR 19.2 version 10.2.1

0 0
replied on January 24, 2019

Importing using Import Agent and Email Archive produce the same results with erratic spaces in the text.

0 0
replied on January 22, 2019

Win 2012 R2 Server with Import Agent 10.3.1.479 and OCR 19.2 version 10.2.1

Client machine is Win 10 with LF 10.4 with OCR 19.2 version 10.2.1 and OCR 18.5 version 9.1.0

LF Server has OCR 19.2 version 10.2.1

0 0
replied on January 22, 2019

If you select the text from the original PDF when it is in PDF form and paste it into another document, are the spaces there?

0 0
replied on January 24, 2019

no.  text looks perfect.

0 0
You are not allowed to follow up in this post.

Sign in to reply to this post.