You are viewing limited content. For full access, please sign in.

Question

Question

How extract text from pdf with images(scanned)? via sdk laserfiche, repositoryaccess, script workflow its possible?

asked on January 20, 2014 Show version history

I have a pdf with text and images (scanned) and for obtaining text from pdf

i use ocr tool of laserfiche. Its working fine. 

 

But via sdk laserfiche , ocrengine not working , not obtaing text from images of pdf. 

So i use code via sdk snapshot and next ocr via sdk its working fine, but 

only client laserfiche is open.

 

There is code:

 


 using (C_AU.ClientManager clienteLaserFiche = new C_AU.ClientManager())
                    {
 
                        IList<C_AU.ClientWindow> carpetasAbiertas = clienteLaserFiche.GetAllClientWindows(C_AU.ClientWindowType.Main).ToList(); 

                        C_AU.MainWindow carpeta = carpetasAbiertas[0] as C_AU.MainWindow; 

                        C_AU.GeneratePagesOptions opcionesCat = new C_AU.GeneratePagesOptions(); 

                        opcionesCat.ShowUI = false; 

                        List<int> documentos = new List<int>();
                        documentos.Add(915);                        
                        carpeta.GeneratePages(documentos, opcionesCat);

                    }

 

next ocr al id document 915

 

 motorOcr = C_DS.OcrEngine.LoadEngine();
 motorOcr.AutoOrient = true;
 motorOcr.Decolumnize = true;
 motorOcr.OptimizationMode = C_DS.OcrOptimizationMode.Accuracy;

motorOcr.Run(C_RA.Document.GetDocumentInfo(Convert.ToInt32(915),varConexion.varSession));

 

 

 

How extract text from pdf with images (scanned) ? via sdk laserfiche, script workflow, etc?

its possible?

 

its possible extract text from pdf with images without call client laserfiche snapshot ?

Jquery.pdf (160.91 KB)
0 0

Answer

APPROVED ANSWER SELECTED ANSWER
replied on January 21, 2014

The OCR engine only works on images. The OCR engine itself does not create image pages. Due to licensing agreements for 3rd party components, the ability to generate image pages from PDFs is not available in the SDK.

1 0

Replies

replied on January 20, 2014

its possible extract text from pdf with images without call client laserfiche snapshot ?

replied on January 21, 2014

ok thanks

0 0
replied on March 6, 2016 Show version history

One thought to share that we've employed elsewhere:

If Laserfiche Import Agent (LFIA) and Laserfiche Workflow are available, create a Workflow rule to push out PDF to folder monitored by an LFIA Import rule with Advanced options to generate Laserfiche Native Document (TIFF Image) as it imports PDF.  Then use Second Workflow rule to merge the LFIA incoming image with the original PDF, or just keep the incoming Image if you don't want the PDF.

0 0
You are not allowed to follow up in this post.

Sign in to reply to this post.