You are viewing limited content. For full access, please sign in.

Question

Question

SDK does not generate searchable text for PDF

asked on February 10, 2022

Hello Sir,

I am using SDK programming to generate searchable text within PDF file.

But some how its generated , can you suggest what to do with this ?

Thanks,
Pratik 

0 0

Replies

replied on February 10, 2022

If the PDF does not have a text layer, then the only way to extract the text is to generate pages first and then OCR the generated pages.  Your best option may be to use DCC instead of SDK.

2 0
replied on February 10, 2022

Please help me how to you DCC ? 

I am not aware of DCC and for only PDF we need this option.

 

Thanks,

Pratik 

 

 

0 0
replied on February 10, 2022

You mean to say from laserfiche workflow we need to use "Schedule OCR" activity using DCC ?


Thanks,

Pratik 

0 0
replied on February 12, 2022

No, I mean that you create a new workflow that uses a "Schedule PDF Page Generation" activity, and set it's rule to run on create and add a criteria that the extension equals pdf.

This process will generate TIFF pages from the PDF.  Then your SDK script can OCR the images that where created by the DCC.

0 0
replied on February 14, 2022

Hello Sir,

 

I don't find this activity name in desktop workflow designer. Actually we are using cloud enviornment where we need to geneate searchable text on PDF which was imported using SDK

What is the best method to do so in laserfiche clound workflow ?

 

Thanks,

Pratik 

0 0
replied on February 15, 2022

Schedule PDF Page Generation is a new activity to Workflow 11 for self-hosted system.

Laserfiche Cloud does not have background PDF page generation yet. However, Laserfiche Cloud does include Import Agent, which does PDF page generation on import. So an SDK application should not be necessary to import PDFs into a cloud repository and generate their pages.

0 0
replied on February 15, 2022

So you mean that OcrEngine does not work when we used SDK for importing data into the cloud repository ?

We have applied same on  self-hosted system where we are able to generate text from images. So please confirm that OCREngine will only work for images not for PDF.

Now things is how we will resolve that if we have 2 TB data alreay uploaded in cloud repository and we need to findout PDF and appplied searchable text ?

Thanks,

pratik 

0 0
You are not allowed to follow up in this post.

Sign in to reply to this post.