You are viewing limited content. For full access, please sign in.

Question

Question

Workflow and iFilter results.

asked on February 5, 2016

We are running into a situation where we need to access the iFilter text from within Workflow the same way you can access the OCR text results from within Workflow.  Unfortunately, the two forms of text appear to behave differently.

In our tests, we have confirmed that we are getting iFilter text because we can do a text search through the client and get the pdf in the result set.

But when we use the Retrieve Document Text task in Workflow (V 9.2.1), we get an error, "Document has no Pages." unless we have also generated pages.  (We no longer want to generate pages due to throughput reasons.)  When we generate pages on our test pdf, Workflow does get these results - i.e. the OCR results.

So the question is, should Workflow be able to access the iFilter results?  If this is not a native capability (hint, hint, maybe it should be) is the only alternative an SDK script?

0 0

Replies

replied on February 5, 2016

When you open the document in the Laserfiche Client, does it have text pages?

The search engine can index PDFs using text extracted through the IFilter but it does not set that text on the document as text pages. So it is possible for a PDF to be searcheable but not have text pages. Workflow's Retrieve Document Text requires that the document have text pages.

0 0
replied on February 5, 2016

Hi Miruna -The document does not have text pages, in that if you hit the text display window, it shows no results, the way this pane would show OCR results.  So, is there a specific step you need to take to create a text page, other than using the Generate Pages utility?  Generate Pages is what we are actually trying to avoid.

 

0 0
replied on February 8, 2016

OK, so then there isn't any text for Workflow to retrieve. Other than generating pages, there is no other way to get text.

0 0
replied on February 8, 2016

Thanks Miruna.  So the use of iFilters is an alternative to OCR'ing image pages, and not an alternative to having to generate the pages in the first place.  Put another way, you can use iFIlters to create your searchable text so long as you have generated pages first. 

 

0 0
replied on February 8, 2016

Hi Bill,

You can generate text pages through iFilters, in a few ways, such as by running the generate text option on the documents in the Client. This will call into the iFilters to obtain the text, and then save it as a text page alongside the document. That, of course, creates text pages. It does not, fwiw, need to generate image pages. 

If you have NOT done that (that is, no text pages exist), then when the full-text search engine indexes the document, it will also use iFilters to generate text at that time. This does NOT generate text pages (and needs to be redone if you re-index, incidently), it's just there so that full-text search can run on that document. Since there's no text pages in that case, there's nothing for WF to pull from. 

0 0
You are not allowed to follow up in this post.

Sign in to reply to this post.