You are viewing limited content. For full access, please sign in.

Question

Question

Inaccurate OCR Results when Importing predefined files using Import Agent

asked on November 19, 2014 Show version history

Because Import Agent does not currently OCR Arabic documents, an external OCR program is used to create an indexed image file with converted text. Import Agent seems to misinterpret the text file that is being input and outputs nonsensical characters in Laserfiche.

 

I'm not sure what is going on here, where in the process is the Text being converted from "مبادئ توجيهية لتعديل دليل البراءات" to "البراءات"? Is that an Import Agent issue or a setting that is not correct?

 

A screenshot is below. As you can see I provide the .lst file, the .txt file output by our OCR engine, and the results of those files being run through Import Agent.

0 0

Answer

APPROVED ANSWER SELECTED ANSWER
replied on November 19, 2014 Show version history

*Edit*

What's the encoding of the text file generated? I tested a unicode text file and that worked fine, but UTF-8 seems to cause this issue. Can you confirm? Also, can the user's process be altered such that it generates a unicode text file?

 

*Edit #2*

If you use the xml list import process that's available in Import Agent 9.0, then UTF-8 text files should be handled properly. You can reference "Image and Text Example.xml" located in C:\Program Files (x86)\Laserfiche\Import Agent\List File Examples

0 0

Replies

You are not allowed to reply in this post.
You are not allowed to follow up in this post.

Sign in to reply to this post.