Hello,
I want to create a workflow which will be scheduled to run at particular time to look for Duplicate files with names Repository wide.
Can someone suggest how to accomplish this or does anyone has made it already?
Regards,
S
Hello,
I want to create a workflow which will be scheduled to run at particular time to look for Duplicate files with names Repository wide.
Can someone suggest how to accomplish this or does anyone has made it already?
Regards,
S
I would run a custom query that gets the duplicates directly from the SQL database associated with the Laserfiche repository.
Something like
select count(name), name from toc where etype = -2 group by name having count(name) >=2
Depending on how many duplicates you have now, you may need to limit the number of results returned.
Then I would iterate through the search results and build a search query for a name search as a token.
The next step depends on whether Web Access is available or not.
Hi Miruna,
Just checking on this as I'm missing something with the URL in that I get the URL for the first search string, but then there's no URL for the subsequent results, just the result token, so the URL isn't applying for the multi-token. e.g.
http://lfserver/laserfiche/index.aspx?db=repository#view=search;search=name1 | name2 | name3 | etc
What am I doing wrong please?
Thanks,
Mike
That's because you have spaces. Most mail clients will underline a URL until they run into a space.
If you right-click the token in your email and choose Token Editor from the context menu, you can then add a function called Encode URI which will encode the spaces so the underlining will still work.
Thanks Miruna
I have been playing with this workflow, which I can set to run once a month. It runs through all the documents looking for duplicate names and records the names of any duplicates in a token, which then gets reported to me in an email.
Last test ran successfully (on targeted test folders).
Connie, that second search looks wrong. Shouldn't it only be searching for documents with the same name as the current entry? As it is, it's searching for all documents.
I've tried a number of configurations and it wasn't until I added the second loop that it successfully checked each document it found with every other document in the repository.
It does look more complicated than it should be. If you can show me a simplified version, I would love to reconfigure mine.
No, I meant, this search:
It's exactly the same as the first one, it's looking for all documents.
Yes, "For Each Entry", it again needs to search the whole repository for any documents and compare the "Current Entry" with every other document in the repository. Is there another way to do this?
Thanks for posting your Workflow Connie, the environment we're trying to clean up ended up with in some cases up to 20 duplicates due to multiple people scanning the same documents when a school moved location, so good to see other ideas.
I'm not explaining this well. The first search returns all documents in the repository. Then for each one of those, the second search also returns all documents in the repository. So the search count will always be more than 2, regardless of whether the current document in For Each has duplicates or not.
This search is also likely to be slow in any reasonably large repository. For ex, in a repository with 1000 documents, you'd be running a search for all 1000 documents 1001 times.
A more efficient way to do it would be something like this:
{LF:name="%(ForEachEntry_CurrentEntry_Name)",type="D"} & {LF:id<>%(ForEachEntry_CurrentEntry_ID)}
That way you specifically search for documents that have the same name as the current entry and are not the current entry.
How big is the repository? Are all entry names expected to be unique? What do you plan on doing when you find duplicates?
The repository is about 500GB & will go much bigger in TB's later.
Yes entry names are expected to be same.
The idea is to put some sort of shortcut in a folder to notify & then the user decides what to do with it.