You are viewing limited content. For full access, please sign in.

Question

Question

Need Help to OCR using Workflow

asked on September 30, 2021

Hi all.

Using my Workflow, I'm using a SDK C# to OCR somes entries.

This is the code :

namespace WorkflowActivity.Scripting.ScriptSDK
{
    using System;
    using System.Collections.Generic;
    using System.ComponentModel;
    using System.Data;
    using System.Data.SqlClient;
    using System.Text;
    using Laserfiche.RepositoryAccess;
    using Laserfiche.DocumentServices;

    /// <summary>
    /// Offre une ou plusieurs méthodes qui peuvent être exécutées au moment de l'exécution de l'activité de scriptage du flux de travail.
    /// </summary>
    public class Script1 : RAScriptClass104
    {
        /// <summary>
        /// Cette méthode est exécutée quand l'activité est effectuée.
        /// </summary>
                protected override void Execute()
        {
            // Write your code here. The BoundEntryInfo property will access the entry, RASession will get the Repository Access session
            string sError = "None";
            try
            {
                //  Retrieves a document to be processed with OCR.
                if ((BoundEntryInfo.EntryType == EntryType.Document))
                {
                    using (DocumentInfo Doc = (DocumentInfo)BoundEntryInfo)
                    {
                        Doc.Lock(LockType.Exclusive);
                        //  Instantiates a new OCR engine.
                        using (OcrEngine ocr = OcrEngine.LoadEngine())
                        {
                            //  configure OCR options
                            ocr.AutoOrient = true;
                            ocr.Decolumnize = true;
                            ocr.OptimizationMode = OcrOptimizationMode.Accuracy;
                            //  Generate text for all pages of the given document
                            PageSet ps = Doc.AllPages;
                            ocr.Run(Doc, ps);
                        }
                    //  unlock the document
                    Doc.Unlock();
                    }
                }
            }
            catch (Exception ex)
            {
                sError = ex.Message;
                WorkflowApi.TrackError(ex.Message);
            }
            SetTokenValue("Script_Error", sError);
        }

    }
}

But the results are not pretty good.

I tried to use REGEX to get some informations but because of the OCR, I have a lot of error.

Can I improve the OCR?

The language is french.

 

Thanks in advance.

Regards

0 0

Replies

replied on October 1, 2021

Hi Oliver, have you tried any of the options settings such as Deskew, Despeckle, etc. Also I don't see in your code where you specify the language, wondering if it's not using the default language as you are running from a script.

1 0
replied on October 1, 2021

Hi Steve,

Thanks for your return.

This is not my code ; I found it on the forum. I don't know all the options and how to call them. That's why I need some help ^^

0 0
replied on October 4, 2021

Just curious why you decided to use a script to OCR the documents rather than using the Schedule OCR workflow activity?

1 0
replied on October 4, 2021

Hi Steve.

 

I do not master the Schedule OCR workflow activity.

I need practices.

0 0
replied on October 4, 2021

There is a ".Language" property of the OcrEngine object.  You could try adding the following line before the ocr.Run(

ocr.Language = "French";

 

0 0
replied on October 4, 2021

Hi Bert,

 

Nice to see you again ^^

Thanks a lot for your help.

Going to try it this week and back to you asap.

 

Regards

0 0
replied on October 4, 2021

Bert,

I tried your solution.

The OCR is better but not perfect.

 

Without your solution

 

With your solution

 

The text is : "Date de clôture de l'exercice"

 

Can we use a word library to improve recognition?

 

 

0 0
You are not allowed to follow up in this post.

Sign in to reply to this post.