You are viewing limited content. For full access, please sign in.

Question

Question

Import Agent, XML list file and OCR

asked on August 26, 2015

I'm running into an issue where Import Agent will only OCR/Generate searchable text for part of a .tif file being brought into Laserfiche. I've tested the IA session with the TIFF file by itself and it imported in and text was generated for all pages. However, as soon as you include the .xml file, IA does not generate text for the very last page. Any suggestions?

0 0

Answer

APPROVED ANSWER
replied on August 26, 2015

In the support case that your reseller had opened, the issue was determined to be a bug with the OCR engine. If you OCR the TIFF document with the "decolumnize" option disabled, then the last page of the document will have text generated for it.

When you enable OCR in the XML list file, the decolumnize option is hardcoded to be enabled. For now, a workaround would be to re-OCR the last page of document manually with the "decolumnize"option disabled using something like the Client.

We have an enhancement request to make the OCR options configurable in XML list files.

0 0

Replies

replied on August 26, 2015

Please contact your reseller to open a support case so we can further investigate. It'd be helpful if you could provide a copy of the TIFF document as well as the associated .xml file. A briefcase export of a sample document with the template referenced in the .xml file would be good to have also.

0 0
replied on August 26, 2015

I've already contacted my reseller and to my knowledge there is or was a support case open. However, all the Laserfiche purposed fixes haven't helped thus far. I've attached copies of both the TIFF and .xml file like you requested. The very last page of the TIFF is the problem area. As I mentioned earlier if I set up IA to bring in the .tif, searchable text is generated for every page. Add the .xml to the equation and bam, no searchable text for the last page.

XML list file.JPG
0 0
APPROVED ANSWER
replied on August 26, 2015

In the support case that your reseller had opened, the issue was determined to be a bug with the OCR engine. If you OCR the TIFF document with the "decolumnize" option disabled, then the last page of the document will have text generated for it.

When you enable OCR in the XML list file, the decolumnize option is hardcoded to be enabled. For now, a workaround would be to re-OCR the last page of document manually with the "decolumnize"option disabled using something like the Client.

We have an enhancement request to make the OCR options configurable in XML list files.

0 0
You are not allowed to follow up in this post.

Sign in to reply to this post.