You are viewing limited content. For full access, please sign in.

Question

Question

How to increase the performance of Documents Migration from one volume to another?

asked on June 6, 2017

Hello,

 

We have 700.000 docs (around 3.5 MM pages) to migrate from Default Volume to Archived Volumes, and every day we are receiving around 1.000 new docs that must be archived after 3 months.

 

My problem is that the current Document Migration process is too slow for that scenario, and we are not capable of finishing the job within the time-window  available for this task. If we run the migration during working hours, all the users start complaining of the slowness of the system, and the non-working time is used between automated processes, backups, etc. So I only have 4 hours available per day for the migration.

 

The client is ready to add more resources to speed up the process, but I must tell them which resources are the most relevant for this goal.

 

So my questions are:

 

1. Is the Volumes migration happening only at the server level, or also at the client? If only at the server, I will not save time using many clients machines in parallel. But if there is part done at the client level, maybe running the migration from many PCs will improve the result.

 

2. Which is the order of relevance between the following resources to increase the Documents Migration performance?

a. Number of Processors

b. Number of Cores

c. CPU speed

d. RAM

e. Disk speed

f. Bandwidth (assuming that all the disks are in a SAN)

 

3. Would it be faster if Migrating entries using Workflow, SDK or the normal Client interface?

 

4. Is there any other factor that could speed up or down the migration process?

 

For your reference, we are still working in LF version 9.2 since the upgrade to v.10 must go through a long internal approval process.

 

Thank you for your advice and best regards,

 

Ignacio PdeA

BMB sal

0 0

Replies

replied on June 7, 2017 Show version history

As Brian mentioned, my experience has also been that IO is usually the main source of any bottleneck for volume migrations.

Workflow is probably the fastest option if you can find a good way to split things up into parallel activities, but this usually requires a fair amount of testing to find the optimal balance (I would start with 1 workflow instance per available task thread).

 

We are currently running a process on over 40 million documents. The process performs a variety of actions, one of which is migrating to a different volume. We are running 32 parallel instances each round, and our processing times are averaging around 56,000 documents (250,000 pages) per hour (our volumes are not located on the app server).

Workflow is certainly capable of processing the number of documents you require in the given amount of time, as long as it has adequate resources, but you would definitely need to run some tests to identify and address your bottleneck (I'd start with disk activity on the volume server and go from there).

 

To ensure your processing times remain consistent I would highly recommend creating one or more "child" processes to run on identifiable groups/document collections. By using a child workflow invoked by a "parent" process, you can start them in parallel and ensure that each instance of the child process starts with a clean slate and will be less affected by the accumulation of workflow activity messages.

For example, a couple years ago we ran some workflows on millions of documents; when using a single consolidated workflow, it started at 0.2 seconds per document but after several hours it slowed to over 1 minute per document. By breaking it down into parent and child tasks (invoked workflows), we were able to maintain the original processing time for the entire duration.

However, you want to be careful with this approach and avoid creating a new instance for every individual document because with the number of documents you are processing in such a short amount of time it could cause performance issues when searching and viewing workflow instances (we use a separate workflow server to avoid this issue).

3 0
replied on June 6, 2017

Clients are not involved in the migration, it is entirely a server operation, so it doesn't matter what tool you use to run it.  This operation is going to be limited by IO, probably bandwidth between the SAN and the server but possibly by the actual disk.  That's just an educated guess though, you should definitely run some diagnostics while you are performing the operation to confirm.

0 0
You are not allowed to follow up in this post.

Sign in to reply to this post.