You are viewing limited content. For full access, please sign in.

Question

Question

Feature Request - File Migration Tool

asked on January 31

Hi All,

 

I'm sure this has been asked before but can't seem to find anything.

 

We often have to do migrations of data from windows folders into Laserfiche (sometimes TB's of data), and whilst Import has the capability to do this, it isn't the most interactive way of doing things, and from experience falls over frequently with large datasets. Quick Fields is of course also an option, but has a double step of import then store.

We generally opt for a manual import process into the windows client, along with some manual auditing etc. to confirm counts etc. We'll then process the documents using workflow and pattern matching/DB lookups to route the documents around and index accordingly etc.

 

A super useful tool might be some kind of migration utility, with logging etc. for errors and a way to automate some of the import steps (basically like a mix of Import Agent, Quick Fields and the Windows Client). 

I can see this being more of a requirement with larger cloud migrations from network drives/windows folders.


Is this something that's ever been considered?

 

Cheers!

Chris

3 0

Answer

SELECTED ANSWER
replied on February 10

Hi Chris,

Re: 

Even tried writing our own SDK method at one point, but that presented its own set of challenges.

We have an internally developed, unofficial high-volume migration tool built with the SDK we may be able to share. We'll reach out.

A key point with high-volume Import Agent migrations is to ensure you are not doing page generation or OCR at import time with IA. Those options drop import speeds by an order of magnitude and the highly compute intensive OCR and page gen processes are more likely the cause of "instability" on the server than Import Agent itself. Use Distributed Computing Cluster (DCC) to do page gen and OCR asynchronously on different servers.

3 0

Replies

replied on January 31

To add to that, if there was the ability to exclude certain filetypes (such as .tmp files, or files starting with ~$), instead of only being able to identify filetypes to include would be helpful too.

2 0
replied on January 31

I've migrated millions of pages to Laserfiche using Import Agent, and have not experienced it "falling over frequently". The main issue I ran into during the migration was that I had a workflow trigger upon import, and that spawned too many workflow processes because the Import Agent would import files faster than the workflows could run on them.

I worked around this by setting a parent workflow to only search for the most recent (X quantity) entries each hour and, within a for-each loop, run the child workflow that would perform the work (renaming, moving, setting metadata, setting a tag indicating it has been processed, etc.) and wait until complete. I would also make the parent workflow only run for 55 minutes. This way, there was only ever 1 parent and 1 child migration workflow running at a time. After I set this up, I had this automation running 24/7 for about 3 months until the migration was complete and never experienced any other major issues. Occasionally, the Import Agent would fail importing files, but that was usually because the files were corrupt.

What are the symptoms of "falling over" that you've witnessed?

2 0
replied on February 3

Hi Kevin,

 

Thanks for the detailed response, and appreciate 'falling over' is a bit vague.

 

The elements I struggled with in my testing was when it stops, it's hard to 'unpick' why it's stops as there isn't very much of an interface or way of feedback or even where it was up to. Yes there is the windows logs but even then it's hard to determine the cause of the failure. Regular migration failures we see are unexpected file types, the path was too long along with unsupported file names/types, corrupt files etc. Sometimes it just 'stop's with no error at all, re-starting the service seems to make it burst into life.

If the windows files are all neatly organised and have the correct path length etc. and are 'clean' then yes absolutely IA would be the way to go, but I can't say I've done a migration in the last 20 years where that was in fact the case.

 

Plus there is the speed element, we've tested all methods of document ingestion, Quick Fields, IA and the windows client, and the windows client by far is the fastest. Even tried writing our own SDK method at one point, but that presented it's own set of challenges.

 

Something 'like' Import Agent would be good, but with some kind of interface that shows you what's going on under the hood in real time (like the windows client does). Plus better error handling etc. 

 

Cheers!

Chris

2 0
SELECTED ANSWER
replied on February 10

Hi Chris,

Re: 

Even tried writing our own SDK method at one point, but that presented its own set of challenges.

We have an internally developed, unofficial high-volume migration tool built with the SDK we may be able to share. We'll reach out.

A key point with high-volume Import Agent migrations is to ensure you are not doing page generation or OCR at import time with IA. Those options drop import speeds by an order of magnitude and the highly compute intensive OCR and page gen processes are more likely the cause of "instability" on the server than Import Agent itself. Use Distributed Computing Cluster (DCC) to do page gen and OCR asynchronously on different servers.

3 0
replied on February 13

Thanks Sam, I'd be interested to test out the unofficial tool. Please can you email it to me? Let me know if you need my email again. Thanks!

 

When doing it through the windows client, we do indeed turn off page generation and OCR. We normally stand up a few 'drone' machine to perform the OCR after the migration has finished.

1 0
replied on February 21

Morning Sam,

 

Just to provide some meaningful feedback on this, the unofficial tool is certainly the way to go. 

 

Doing some isolated testing here, the tool imported 2160 files in 1.04 minutes, I then imported the same files using the windows client with generate pages and OCR disabled, and that took 18 minutes and 14 seconds. 

 

So all in all, the tool seems to be the way to go. We'll continue to develop this in house, but many thanks for the starting point!

 

Cheers!

1 0
replied on February 21

Great to hear! And yeah, it's fast =D

0 0
replied on February 21

@████████ did you try the same files using Import Agent. I'm curious how it compares.

 

0 0
You are not allowed to follow up in this post.

Sign in to reply to this post.