You are viewing limited content. For full access, please sign in.

Question

Question

Worklflow for OCR'ing documents overnight

asked on February 3, 2016

Hello,

I have a client who is on version 8.3 who currently wants to scan everything into Laserfiche then have a workflow kick off after office hours and search the repository for un-OCR'd documents and OCR them at night.  I see people recommending the use of cluster but that requires the use of another machine.  They do not have another machine to cluster with.  What are the activities needed in workflow to be able locate all the un-OCR'd documents then have the OCR process kicked off on this one machine after office hours.

Thanks 

0 0

Replies

replied on February 3, 2016

You can use the Distributed Computing Cluster (DCC) on one machine. However, we don't recommend putting it on the Workflow server because OCR is very resource intensive and it will compete with the Workflow Server for CPU. If you're running the Laserfiche and SQL Servers on the Workflow Server as well, then I definitely would recommend against using that for OCR.

That said, if your Workflow Server machine is fairly powerful, you can use DCC on it and limit it to one worker. OCR will be slower, but in theory, the effect on Workflow will be minimal.

0 0
replied on February 4, 2016

If everything is on just one machine then are there alternatives you could recommend?  Also, would the only two activities I would need to get this done in workflow designer be "Find Entries" and "Schedule OCR".

They seem to be pushing the envelope and getting impatient so any help or resolutions would be extremely helpful.  All servers are on one machine (Workflow, SQL, Laserfiche) and they do not have any extra machines to move the servers.  What workflow in 8.3 can i draw up to run a search and OCR at night when people arent at work?

 

Thanks

0 0
replied on February 4, 2016

You would need Search Repository and Schedule OCR. The OCR one is not available in Workflow 8.3.

The only way you can do it in 8.3 would be to script it. But then you'd be on your own for handling OCR errors and such.

DCC does not need a server-grade machine. For a "cluster" with one node you can use a workstation.

0 0
replied on February 4, 2016

So I would be able to install the cluster on the same machine as the workflow, laserfiche and SQL server?  Good to know.  And would I be able to upgrade just the Workflow to 9 and get the schedule OCR? Or would that also cause errors?

0 0
replied on February 4, 2016

You can update just Workflow to version 9 and keep the Laserfiche server as 8.3.

0 0
replied on February 4, 2016

We have a very robust server that should be able to handle having the Workflow, Laserfiche, SQL servers and the Cluster on it as well.  

So from what I am gathering from you is, we install cluster on the same server.  Upgrade to v9.0 for workflow. Keep LF at 8.3.  Create a workflow with activites "Find entries" and "Schedule OCR" and we should be good to go?

0 0
replied on February 4, 2016

Search Repository, not Find Entries. Yes to the rest.

0 0
replied on February 5, 2016

So we are trying to create this overnight OCR workflow in our own environment first.  And I get the workflow to initiate and run but it gets hung up on searching repository.  http://screencast.com/t/YYGv4ytFVl I have the DCC installed and I tested all connections.  They seem to be working fine.  I have the search syntax input http://screencast.com/t/fe2SKVZPA and when i test that it returns the right amount of results.  Lastly I have activity schedule OCR set to run on whatever Search Repository finds in the output entries. http://screencast.com/t/TlVTAySlj is there something I'm missing as to why it hangs on the search repository stage?

0 0
replied on February 5, 2016

If you test that search in the Client, how long does it take?

0 0
replied on February 5, 2016

It takes about half a second.  I only have 3 documents that should and do pop up when i run that search syntax within the client.  I changed it to a business process just to see if i could make it work instead of using as scheduler and I received this http://screencast.com/t/HnwtML6Pf

0 0
replied on February 9, 2016

Any new ideas regarding this problem I'm facing?

0 0
replied on February 19, 2016

So I was able to get the DCC OCR to work on our in house machine on version 10.  However I am trying to perform this task on our clients server on version 9.2 and I double checked everything is set up the same way but it seems to just keep running and running and I never see the number of entries decrease.  I set the scheduler for 6pm last night and it says it was kicked off then but no progress was made since the amount of un-OCR'd documents is the same as when i checked it this morning.  Any help will be greatly appreciated.

0 0
replied on February 19, 2016

Hi Kyle, 

At this point, you'll probably want to open up a support case for this, especicially if it's specific to one of the environments. 

0 0
replied on February 22, 2016

I was informed the OCR process for workflow and DCC is only meant for .tiff extensions.  The main amount of documents found are PDF.  What are some alternative automated ways to go about this everynight instead of having to manually do it?

0 0
You are not allowed to follow up in this post.

Sign in to reply to this post.