Question

Non-OCR-able documents

Version 10 How To

Updated February 8, 2018

asked on February 7, 2018 • Show version history

Using the Distributed Computing Cluster, overnight we run a search for non-OCR'd documents, and then OCR them. We have a lot of pictures in our repository, and when our nightly process runs, it's bogged down by the pictures that have no text to OCR, yet they keep coming up in the search.

Aside from having the users select a tag to use to bypass these in the search, does anyone have any clever tricks for dealing with this? If I try to use a tag "Do not OCR", I can't seem to get the syntax right. I've tried {LF:Tags<>"Do not OCR"} and {LF:Tags~="Do not OCR"}

Is there a way to grab those documents and manually (all at once) add some text to the text area?

We're on version 10.2.

0 0

Replies

replied on February 8, 2018 • Show version history

I add a date range to the search so that I only try to OCR documents that have been created or modified within (let's say) the last 2 weeks. This way, it will attempt to OCR it but if it does not produce text after that date range, the document drops off the OCR list.

1 0

replied on February 8, 2018

Search with this syntax ({LF:id > 0} - {LF:Tags="Do not OCR"})

You might have your workflow assign a tag to each document as it is sent to your DCC, 'OCR Complete'. You would process all your image files one more time but you could then filter them out on the next search.

0 0

replied on February 8, 2018 • Show version history

Have you looked to see if the files with images are considerably larger in size? Could you do your search on page size or document size and only pull files under a certain size?

Applying a tag after the OCR is a good idea to prevent files from getting re-pulled every night but it would not stop the new files from getting checked each night. If you don't have that many new files coming in that might be okay.

0 0

You are not allowed to follow up in this post.

Question

Question

Non-OCR-able documents

Replies

Sign in to reply to this post.