You are viewing limited content. For full access, please sign in.

Question

Question

Data Consistency Checker Strategy/Tool

asked on May 27, 2022

We want to migrate user data from windows file system to Laserfiche Repository , is there a way to check the consistency of data migrated. 

Does Laserfiche Provide any tool to perform this.

0 0

Replies

replied on May 27, 2022

Hi Prayagi,

There is not a specific tool for checking file system to Laserfiche repository migrations.

The most basic check would be calculating the number of files on the file system you plan to migrate, and then once the migration is complete, checking the repository properties to confirm there are the same number of documents in Laserfiche at the end.

0 0
replied on May 31, 2022 Show version history

Hi Samuel,

Thank you for the information on this. So we had a use case where we have to assure the client that the files that are imported using import agent,

1)  not corrupted after transferred to LF Repo

2)  The consolidation report on the volume of data import (number of files, directories, size of files,..)

What would be the best and efficient way of getting this information ? This for around 10TB of data.

 

If we plan to generate a SHA-1 hash initially before transfer(on to a CSV) and compare this for each transferred file using WF after the file is into LF Repo. What would be he approximate speed of the WF to process this information ?

 

Also want to know it LF already assigns either SHA1 or MD5 hash values to files and is available as a part of file information in the script.

0 0
replied on June 9, 2022

Laserfiche Volume Checksums generate SHA-1 hashes of entry data. Those checksums are stored somewhere in the repository database - not exactly sure where offhand. 

Having Volume Checksums enabled lets you run validation reports. It is important to note that this validation checks the hashes the current files in the repository against the hashes created and stored in the database at the time of import (or enabling checksums, if the file is already imported).

The volume validation report does not validate that the file wasn't corrupted during the transfer to Laserfiche. 

I think you would have to get the checksums from a database query to use them in a workflow. I don't see a way to get the checksums in the way you need them using the Laserfiche Repository Access SDK library, which seems to only return the value as part of an error report if there's a current mismatch.

It's possible Import Agent and the repository do their own checksum validation as part of the import process, where Import Agent sends the checksum to Laserfiche Server which validates it on the receiving end and throws an import error on a mismatch. I'll check with the team on that. 

For case #2, how important is it that the report contain only stats on what Import Agent brings in, vs everything?

0 0
replied on June 12, 2022 Show version history

Thank you Samuel for the detailed information on this.

If we have to get the checksums(SHA-1) using database query, would like to know which database will give us these hash values stored, so then we don't need to regenerate and can make use of the once already being calculated as part of the process(if not SHA-1 or any other standard used let us know, then we will use this hash for the source files too).

If not possible to access the SHA-1 hash values from the database, is there a way to get the actual mapped path for the files in the repository against each of the virtual paths as shown up in the repo. As we will be importing the files in batches, a way to access the path of only those imported each time would be efficient to calculate the hash values in batches(Say some set of department folder at a time). Exploring the folder structure, it looks like the files are stored in logically number folders as files get imported and cannot be directly mapped using actual file path. 

If Import Agent is doing this checksum validations as part of the import process, is there way we could get a report/log of the files imported (I did see that it could show log of files that couldn't be imported with error message in windows event viewer. Any other setting that could give me more details).

For Case #2, Since we are planning to import files in phases. Creating reports as the files are brought in by Import Agent in each phase would be a good strategy. 

Would be grateful, if you have any better alternative suggestions too.

0 0
replied on July 7, 2022

Hi Prayagi,

My apologies for the very delayed response. You can find the SHA-1 checksum values in the dbo.toc table under the edoc_cksum column. The toc value is the Entry ID. Please note that you'll only see edoc_cksum values for "electronic documents", i.e., non-image (Laserfiche page) files. Image checksums are in the dbo.doc table but if you're doing TIFF page generation (vs say importing a single png or jpeg) those checksums won't match the "source" because pages are usually Laserfiche-generated files generated from what you import.

I haven't had a chance to connect with the right person on the Import Agent team yet to get an answer there. Hope the above helps in the meantime.

0 0
replied on June 12, 2022

Hi Prayagi, 

I wanted to let you know that I'm out of the office this week and won't be able to respond to your detailed questions until I'm back. 

Cheers, 

Sam

0 0
replied on June 13, 2022

Thank you Samuel for quick response. Sure, no worries.

0 0
replied on June 22, 2022

Hi Sam, Thank you for your time, Let me know of any updates on this and suggestions.

0 0
You are not allowed to follow up in this post.

Sign in to reply to this post.