You are viewing limited content. For full access, please sign in.

Question

Question

SDK text extractor not working on some workstations

SDK
asked on November 17, 2017

I'm currently using documentservices v. 9.2.0

I have validated that text extraction does work from the client in all instances by manually extracting text.  I'm doing this without generating pages... straight up text extraction from a pdf supplied in the Laserfiche client install path.  Specifically, I'm testing on Sample5.pdf but I see the same behavior with other pdfs.

Here's the code:

if (te.IsExtensionSupported(docInfo.Extension)) Console.WriteLine(docInfo.Extension + " supported.");
else Console.WriteLine(docInfo.Extension + " not supported.");
bool extractSuccess = te.ExtractFrom(docInfo);
if (extractSuccess) Console.WriteLine("Text Extraction worked.");
else Console.WriteLine("Extraction Failed");
docInfo.Save();

In all cases, the first line returns "pdf supported."

On my development workstation, extractSuccess returns true.  On my test workstations, extractSuccess returns false.

Unfortunately, the ExtractFrom method returns only a boolean value.  I can't figure out any way to tell what is going wrong... it only tells me that the extraction failed.

My question is: Is there any way to get more information about why this might be failing?

0 0

Replies

replied on November 20, 2017

Text extractors rely on an ifilter being installed for the file type.  Have you installed one for pdf on the test machine?

0 0
replied on December 1, 2017

Brian,

We have looked pretty carefully at the ifilters.  In this case, we have tried both the version 9.0 and version 11 pdf ifilters.  Neither seem to work.  I have validated that one end user has the exact same ifilter as I do on my development workstation.  No my workstation, pdfs generate a "true" response from the ExtractFrom method while the end user doing the test gets a "false" response.

Peter

0 0
replied on December 1, 2017

Also... I wrote a simpler application to just extract text from the entryId specified in the command line.  When I did this, I updated the api to use the 10.0 version of repsitoryaccess and documentservices.  The new code does the same... works on my development workstation but not for the end user.

0 0
You are not allowed to follow up in this post.

Sign in to reply to this post.