Workflow to run OCR overnight through DCC - how to exclude non-text files?

SELECTED ANSWER

replied on July 13, 2017 • Show version history

Laura,

I don't believe there's any built-in parameter that would tell your search whether or not the image actually has any text. It is kind of a circular problem. You want to OCR any documents that do not have text, but don't know if it has text until you try to OCR it, and if it has no text you're back at square 1.

I think a good option might be to use a Field or a Tag. After you process each document assign a field value like OCR: True, or an Information or Security Tag named something like "OCR'ed" and then you have something to exclude from your search results so you don't hit the same document over and over.

0 0

View 1 previous reply

replied on July 17, 2017

After applying the patch in KB 1013860, this workaround is no longer necessary. DCC will generate empty text pages for blank image pages (as do the rest of Laserfiche products), so the search in your initial post will only find documents that haven't gone through OCR yet.

1 0

replied on July 17, 2017

Is this included in any of the updates, or is it still a standalone patch for now?

0 0

replied on July 18, 2017

It's a standalone patch for now.

1 0

Question

Question

Workflow to run OCR overnight through DCC - how to exclude non-text files?

Answer

Replies

Sign in to reply to this post.