You are viewing limited content. For full access, please sign in.

Question

Question

Find Duplicate Entries

asked on October 15, 2014

Hello,

I want to create a workflow which will be scheduled to run at particular time to look for Duplicate files with names Repository wide.


Can someone suggest how to accomplish this or does anyone has made it already?

Regards,

S

 

0 0

Replies

replied on October 16, 2014

I would run a custom query that gets the duplicates directly from the SQL database associated with the Laserfiche repository.

Something like

select count(name), name 
from toc
where etype = -2
group by name
having count(name) >=2

Depending on how many duplicates you have now, you may need to limit the number of results returned.

Then I would iterate through the search results and build a search query for a name search as a token.

The next step depends on whether Web Access is available or not.

  • If Web Access is available, then use a Generate Web Access URL with the search and email that link to the user. That way, the search runs when they click the link from the email.
  • If Web Access is not an option, then you'd run the search from Workflow, then go through the results with For Each Entry and create shortcuts.

3 0
replied on May 9, 2018

Hi Miruna,

Just checking on this as I'm missing something with the URL in that I get the URL for the first search string, but then there's no URL for the subsequent results, just the result token, so the URL isn't applying for the multi-token.  e.g.

http://lfserver/laserfiche/index.aspx?db=repository#view=search;search=name1 | name2 | name3 | etc

What am I doing wrong please?

Thanks,

Mike

 

0 0
replied on May 10, 2018

That's because you have spaces. Most mail clients will underline a URL until they run into a space.

If you right-click the token in your email and choose Token Editor from the context menu, you can then add a function called Encode URI which will encode the spaces so the underlining will still work.

0 0
replied on May 10, 2018

Thanks Miruna

0 0
replied on May 16, 2018 Show version history

I have been playing with this workflow, which I can set to run once a month.  It runs through all the documents looking for duplicate names and records the names of any duplicates in a token, which then gets reported to me in an email.

Last test ran successfully (on targeted test folders).

Find Duplicate Documents.png
1 0
replied on May 16, 2018

Connie, that second search looks wrong. Shouldn't it only be searching for documents with the same name as the current entry? As it is, it's searching for all documents.

0 0
replied on May 16, 2018 Show version history

I've tried a number of configurations and it wasn't until I added the second loop that it successfully checked each document it found with every other document in the repository.

It does look more complicated than it should be.  If you can show me a simplified version, I would love to reconfigure mine.

0 0
replied on May 16, 2018

No, I meant, this search:

It's exactly the same as the first one, it's looking for all documents.

0 0
replied on May 16, 2018

Yes, "For Each Entry", it again needs to search the whole repository for any documents and compare the "Current Entry" with every other document in the repository.  Is there another way to do this?

0 0
replied on May 16, 2018

Thanks for posting your Workflow Connie, the environment we're trying to clean up ended up with in some cases up to 20 duplicates due to multiple people scanning the same documents when a school moved location, so good to see other ideas.

0 0
replied on May 17, 2018

I'm not explaining this well. The first search returns all documents in the repository. Then for each one of those, the second search also returns all documents in the repository. So the search count will always be more than 2, regardless of whether the current document in For Each has duplicates or not.

This search is also likely to be slow in any reasonably large repository. For ex, in a repository with 1000 documents, you'd be running a search for all 1000 documents 1001 times.

A more efficient way to do it would be something like this:

{LF:name="%(ForEachEntry_CurrentEntry_Name)",type="D"} & {LF:id<>%(ForEachEntry_CurrentEntry_ID)}

That way you specifically search for documents that have the same name as the current entry and are not the current entry.

1 0
replied on October 15, 2014

How big is the repository? Are all entry names expected to be unique? What do you plan on doing when you find duplicates?

0 0
replied on October 16, 2014


The repository is about 500GB & will go much bigger in TB's later.

Yes entry names are expected to be same.

The idea is to put some sort of shortcut in a folder to notify & then the user decides what to do with it.
 

0 0
You are not allowed to follow up in this post.

Sign in to reply to this post.