Question

Is there a way to find file duplicates (based on contents of file) within Laserfiche?

Laserfiche Version 11 Administration Search and Retrieval How To

Updated September 27, 2023

asked on September 25, 2023

As part of our implementation, we will need to locate file duplicates within Laserfiche and (ideally) replace copies with shortcuts to the master versions of those documents. Some of the duplicates may be named differently from the original files, and have attributes such as Date Edited that could be different as well.

Progressing further, I would ideally also like to be able to check for duplicates between departments, so that if something like a copy of a policy has been saved by both teams, we are aware of it and able to address it.

My preference would be to use SHA256 instead of just reviewing all text in a file to compare, is there a way to generate hashes for all files in the system to compare and locate potential duplicate files?

0 0

Answer

SELECTED ANSWER

replied on September 27, 2023 • Show version history

Self-hosted gives you some options because there's a native feature called Volume Checksums that automatically generates SHA-1 hashes for each file and stores them in the repository database.

I've written about this in a past Answers post and recommend you start by reading all the comments there: Data Consistency Checker Strategy/Tool

Though a different use case on the surface, the technical aspects of "generate and compare file hashes" are similar.

---

Laserfiche Cloud does not expose file checksums in any way you can access, so you'd have to build your own processes to generate and store the initial hashes, as well as the de-dupe checks. I'd write the hashes to a hidden metadata field on the entries in Laserfiche too, at least for "edocs" (non-image files).

0 0

Replies

replied on September 26, 2023

Hi Hannah,

Is this implementation a self-hosted or Laserfiche Cloud system?

0 0

replied on September 27, 2023

Hi Samuel,

Our implementation is self-hosted but I would be interested to know the answer to this question for cloud as well.

Right now my potential workaround is to generate SHA256 hashes for all files pre-import and maintain a database that I validate file batches against each time we migrate a business area's content in (to check for cross-departmental duplication), but if there's a way to do this in-system that would be preferable!

0 0

SELECTED ANSWER

replied on September 27, 2023 • Show version history

Self-hosted gives you some options because there's a native feature called Volume Checksums that automatically generates SHA-1 hashes for each file and stores them in the repository database.

I've written about this in a past Answers post and recommend you start by reading all the comments there: Data Consistency Checker Strategy/Tool

Though a different use case on the surface, the technical aspects of "generate and compare file hashes" are similar.

---

0 0

replied on September 27, 2023

Thank you! I'll check out that post re generating hashes etc. Thanks too for the field recommendation -it had occurred to me to create a hidden field for the hash value attribute but I hadn't looked into it yet.

0 0

You are not allowed to follow up in this post.

Question

Question

Is there a way to find file duplicates (based on contents of file) within Laserfiche?

Answer

Replies

Sign in to reply to this post.