Hi all,
I want to use workflow to perform OCR and extract the text. Any help on how I can achieve this, and especially help with the libraries needed and the laserfiche references.
Thanks Prudence,
Hi all,
I want to use workflow to perform OCR and extract the text. Any help on how I can achieve this, and especially help with the libraries needed and the laserfiche references.
Thanks Prudence,
First of all, the OCR process is a resource intense process, so it is not recommended that it be done from your main production Workflow Server. So set up a new Workflow Server to run the OCR workflow.
That said...
Add reference for Laserfiche.DocumentServices and add the using statement at the top of the code.
Then in your Execute code block:
protected override void Execute()
{
// Write your code here. The BoundEntryInfo property will access the entry, RASession will get the Repository Access session
string sError = "None";
try
{
// Retrieves a document to be processed with OCR.
if ((BoundEntryInfo.EntryType == EntryType.Document))
{
using (DocumentInfo Doc = (DocumentInfo)BoundEntryInfo)
{
Doc.Lock(LockType.Exclusive);
// Instantiates a new OCR engine.
using (OcrEngine ocr = OcrEngine.LoadEngine())
{
// configure OCR options
ocr.AutoOrient = true;
ocr.Decolumnize = true;
ocr.OptimizationMode = OcrOptimizationMode.Accuracy;
// Generate text for all pages of the given document
PageSet ps = Doc.AllPages;
ocr.Run(Doc, ps);
}
// unlock the document
Doc.Unlock();
}
}
}
catch (Exception ex)
{
sError = ex.Message;
WorkflowApi.TrackError(ex.Message);
}
SetTokenValue("Script_Error", sError);
}
Hi Bert, do you have the DLLs available ?
tks
You will need for both the RepositoryAccess and DocumentServices DLLs to be the same build. You can reference the v11 DLLs with the following paths:
Those wouldn't be enough. You would have to install the desktop client on this machine to get the OCR engine. And i'll reiterate what Bert said about this being resource intensive and likely to interfere with Workflow performance.
The supported way to OCR documents from Workflow is through Distributed Computing Cluster.
I'm sorry, I misunderstood. I thought you were supposed to send the files to LaserFilter. But thank you very much for replying.
@████████ You do send the documents to the Laserfiche repository. Once the documents are in the repository, you use workflow to send them to DCC to get OCRed.