You are viewing limited content. For full access, please sign in.

Question

Question

OCR , Extract the text using c#.

asked on July 16, 2018 Show version history

Hi all,

I want to use workflow to perform OCR and extract the text. Any help on how I can achieve this, and especially help with the libraries needed and the laserfiche references.

Thanks Prudence,

 

 

0 0

Replies

replied on July 16, 2018 Show version history

First of all, the OCR process is a resource intense process, so it is not recommended that it be done from your main production Workflow Server.  So set up a new Workflow Server to run the OCR workflow.

That said...

Add reference for Laserfiche.DocumentServices and add the using statement at the top of the code.

Then in your Execute code block:

        protected override void Execute()
        {
            // Write your code here. The BoundEntryInfo property will access the entry, RASession will get the Repository Access session
            string sError = "None";
            try
            {
                //  Retrieves a document to be processed with OCR.
                if ((BoundEntryInfo.EntryType == EntryType.Document))
                {
                    using (DocumentInfo Doc = (DocumentInfo)BoundEntryInfo)
                    {
                        Doc.Lock(LockType.Exclusive);
                        //  Instantiates a new OCR engine.
                        using (OcrEngine ocr = OcrEngine.LoadEngine())
                        {
                            //  configure OCR options
                            ocr.AutoOrient = true;
                            ocr.Decolumnize = true;
                            ocr.OptimizationMode = OcrOptimizationMode.Accuracy;
                            //  Generate text for all pages of the given document
                            PageSet ps = Doc.AllPages;
                            ocr.Run(Doc, ps);
                        }
                    //  unlock the document
                    Doc.Unlock();
                    }
                }
            }
            catch (Exception ex)
            {
                sError = ex.Message;
                WorkflowApi.TrackError(ex.Message);
            }
            SetTokenValue("Script_Error", sError);
        }

 

5 0
replied on November 14

Hi Bert, do you have the DLLs available ?

 

tks

0 0
replied on November 14

You will need for both the RepositoryAccess and DocumentServices DLLs to be the same build.  You can reference the v11 DLLs with the following paths:

  • C:\Program Files\Laserfiche\Workflow\DocumentServices\11.0.0.0
  • C:\Program Files\Laserfiche\Workflow\RepositoryAccess\11.0.0.0
0 0
replied on November 14

Those wouldn't be enough. You would have to install the desktop client on this machine to get the OCR engine. And i'll reiterate what Bert said about this being resource intensive and likely to interfere with Workflow performance.

The supported way to OCR documents from Workflow is through Distributed Computing Cluster

 

1 0
replied on November 17

I'm sorry, I misunderstood. I thought you were supposed to send the files to LaserFilter. But thank you very much for replying.

0 0
replied on November 17

@████████ You do send the documents to the Laserfiche repository.  Once the documents are in the repository, you use workflow to send them to DCC to get OCRed.

0 0
replied on July 16, 2018

Thank you very much Warren for the responsesmiley

replied on July 16, 2018

Though I think I might have asked my question the other way round.

 

replied on July 16, 2018

What do you mean that you may have asked the question the other way round?

You are not allowed to follow up in this post.

Sign in to reply to this post.