You are viewing limited content. For full access, please sign in.

Question

Question

Is there a way to Generate PDF text (OCR) and not pages when importing file?

asked on March 11, 2017

Hi, 

 

Reason i ask this is because when we are having to generate pages on a file that is 100KB in size, it becomes 4.1MB in size when pages are being created. This is not ideal for the client environment as the size is like 40x the normal size of the file. The PDF files have color in them, but even when i tried the monochrome option, the actual size became 20MB?

 

So, not sure if this was asked, but would like to know if it is possible to have the documents being saved that are PDF's have text on them but not pages? Or, should / is there is a way to generate pages where the file size does not grow that big? 

 

Thank you

Ziad

 

0 0

Replies

replied on March 13, 2017

Hi Ziad

Go into the LF Options in the Client or WebAccess, and under options, turn off (uncheck) Generate Laserfiche Pages. This way it will just import the PDF without converting it. Also, if you notice above that, Generate Searchable Text can still be selected to OCR your document.

1 0
replied on March 13, 2017

Steve, 

 

Thank you for this, do you know if this will generate text for all types of documents and not just PDF's? Or i know there is another area to Extract Text from Office documents, 

 

Thanks again

Ziad

0 0
replied on March 14, 2017

Hi Ziad

It matters hows the documents are imported into Laserfiche. Typically with the Thick client, all files generatetext at time of import.

If you are using Drag and Drop with WebAccess, you need to install the PDFifilter pack and OfficeiFilter Pack on the WebAccess Server to extract text from those document types. You can find more info about those in the help files. If you have other file types where Text is not generated on import, you will need to set up DCC (Distributed Cluster Computer) and then set up a Workflow with a Generate Text Task.

One thing you will notice is that Forms saved into the repository will not be ocr'd. like above you will have to use workflow to OCR them if you want to make them text searchable.

1 0
You are not allowed to follow up in this post.

Sign in to reply to this post.