You are viewing limited content. For full access, please sign in.

Question

Question

Workflow to run OCR overnight through DCC - how to exclude non-text files?

asked on July 13, 2017

I currently have a Workflow running nightly to Search Repository for any entries with LF pages and without OCR text, and Schedule OCR through the Distributed Computing Cluster.  It is running as intended, however we have quite a lot of documents with LF pages that have no text, and therefore will never have OCR'ed pages.  Photos, for instance.  Is there a way to adjust the search syntax to exclude these files so that they aren't re-run each night?  

({LF:Name="*", Type="DB"} - {LF:Ext="*"}) & ({LF:AssociatedPages="Y"} & {LF:OCR=none})

0 0

Answer

SELECTED ANSWER
replied on July 13, 2017 Show version history

Laura,

I don't believe there's any built-in parameter that would tell your search whether or not the image actually has any text. It is kind of a circular problem. You want to OCR any documents that do not have text, but don't know if it has text until you try to OCR it, and if it has no text you're back at square 1.

I think a good option might be to use a Field or a Tag. After you process each document assign a field value like OCR: True, or an Information or Security Tag named something like "OCR'ed" and then you have something to exclude from your search results so you don't hit the same document over and over.

0 0
replied on July 13, 2017

Thanks Jason!  A tag would probably suit my needs.  I'll try adding that step to my workflow.  

0 0
replied on July 17, 2017

After applying the patch in KB 1013860, this workaround is no longer necessary. DCC will generate empty text pages for blank image pages (as do the rest of Laserfiche products), so the search in your initial post will only find documents that haven't gone through OCR yet.

1 0
replied on July 17, 2017

Is this included in any of the updates, or is it still a standalone patch for now?

0 0
replied on July 18, 2017

It's a standalone patch for now.

1 0

Replies

You are not allowed to reply in this post.
You are not allowed to follow up in this post.

Sign in to reply to this post.