Question

Need Help to OCR using Workflow

Workflow SDK

Updated October 4, 2021

asked on September 30, 2021

Hi all.

Using my Workflow, I'm using a SDK C# to OCR somes entries.

This is the code :

namespace WorkflowActivity.Scripting.ScriptSDK
{
    using System;
    using System.Collections.Generic;
    using System.ComponentModel;
    using System.Data;
    using System.Data.SqlClient;
    using System.Text;
    using Laserfiche.RepositoryAccess;
    using Laserfiche.DocumentServices;

    /// <summary>
    /// Offre une ou plusieurs méthodes qui peuvent être exécutées au moment de l'exécution de l'activité de scriptage du flux de travail.
    /// </summary>
    public class Script1 : RAScriptClass104
    {
        /// <summary>
        /// Cette méthode est exécutée quand l'activité est effectuée.
        /// </summary>
                protected override void Execute()
        {
            // Write your code here. The BoundEntryInfo property will access the entry, RASession will get the Repository Access session
            string sError = "None";
            try
            {
                //  Retrieves a document to be processed with OCR.
                if ((BoundEntryInfo.EntryType == EntryType.Document))
                {
                    using (DocumentInfo Doc = (DocumentInfo)BoundEntryInfo)
                    {
                        Doc.Lock(LockType.Exclusive);
                        //  Instantiates a new OCR engine.
                        using (OcrEngine ocr = OcrEngine.LoadEngine())
                        {
                            //  configure OCR options
                            ocr.AutoOrient = true;
                            ocr.Decolumnize = true;
                            ocr.OptimizationMode = OcrOptimizationMode.Accuracy;
                            //  Generate text for all pages of the given document
                            PageSet ps = Doc.AllPages;
                            ocr.Run(Doc, ps);
                        }
                    //  unlock the document
                    Doc.Unlock();
                    }
                }
            }
            catch (Exception ex)
            {
                sError = ex.Message;
                WorkflowApi.TrackError(ex.Message);
            }
            SetTokenValue("Script_Error", sError);
        }

    }
}

But the results are not pretty good.

I tried to use REGEX to get some informations but because of the OCR, I have a lot of error.

Can I improve the OCR?

The language is french.

Thanks in advance.

Regards

0 0

Replies

replied on October 1, 2021

Hi Oliver, have you tried any of the options settings such as Deskew, Despeckle, etc. Also I don't see in your code where you specify the language, wondering if it's not using the default language as you are running from a script.

1 0

replied on October 1, 2021

Hi Steve,

Thanks for your return.

This is not my code ; I found it on the forum. I don't know all the options and how to call them. That's why I need some help ^^

0 0

replied on October 4, 2021

Just curious why you decided to use a script to OCR the documents rather than using the Schedule OCR workflow activity?

1 0

replied on October 4, 2021

Hi Steve.

I do not master the Schedule OCR workflow activity.

I need practices.

0 0

replied on October 4, 2021

There is a ".Language" property of the OcrEngine object. You could try adding the following line before the ocr.Run(

ocr.Language = "French";

0 0

replied on October 4, 2021

Hi Bert,

Nice to see you again ^^

Thanks a lot for your help.

Going to try it this week and back to you asap.

Regards

0 0

replied on October 4, 2021

Bert,

I tried your solution.

The OCR is better but not perfect.

Without your solution

With your solution

The text is : "Date de clôture de l'exercice"

Can we use a word library to improve recognition?

0 0

You are not allowed to follow up in this post.

Question

Question

Need Help to OCR using Workflow

Replies

Sign in to reply to this post.