You are viewing limited content. For full access, please sign in.

Question

Question

Deleting Duplicate Files

asked on May 19, 2017 Show version history

I have a workflow that sorts and processes forms that are completed on a third party website and then sent to our server. The third party sends us new forms four times a day. Import Agent picks up the forms from the server and moves them into Laserfiche where the workflow begins automatically. Forms move to a 'Hold Until N Reached' folder and are held there until all eleven of an employee's forms have been submitted. Forms then move to Benefits, HR, and Payroll for processing by staff members.   

 

The problem we have is that we are receiving duplicates of many of the forms from the third party site at so the move portions of the workflow are kicking off when eleven forms are present in an employee's folder even if some of those forms may be duplicates of forms that were previously sent to us. The duplicates have the same file names, properties, and contents as the originals except for the (\d) after the file name. We are working with the third party to resolve this.

 

In the meantime, does anyone have any suggestions for automatically deleting duplicates out of my Hold Until N Reached folders?

Our form/file names are formatted as Last, First Middle - BISD- FormName - DateFormCompletedOnThirdPartySite.

Each of the eleven forms has a unique pattern match set up for it. For example we have  \w+, \w+ \w+ - BISD - EMPLOYEE INFORMATION - EMERGENCY CONTACT - \d{1,2}-\d{1,2}-\d{4} for one form and \w+, \w+ \w+ - BISD - EMPLOYEE HANDBOOK ACKNOWLEDGEMENT AND RECEIPT - \d{1,2}-\d{1,2}-\d{4} for another.

I was thinking about trying something like adding a conditional parallel between the 'Set 1 Route Entry to Hold Until N Reached' step and the 'Wait Condition- Proceed if Form Count = 11' step with one branch for file names formatted correctly and another for those with a (\d) after them. I would delete entries with the (\d) and terminate the workflow at that point. Would a \w+, \w+ \w+ - BISD - .+ - \d{1,2}-\d{1,2}-\d{4} [(]\d[)] work to find duplicates and delete them. I would like to avoid having to create 11 new patterns if I can. I could probably also write a separate workflow that monitors each newly created 'Hold Until N Reached' folder and deletes duplicates apart from the main workflow.  

 

Thank you for your suggestions.

0 0

Replies

replied on May 19, 2017

I would not rely on a simple document count to proceed with the workflow.  I would create a Parallel that would have a branch for each document name, so that it does not continue until each document type is present.

0 0
replied on May 19, 2017

I would agree. If these are stored in individual folder for each user, then maybe add the document name to a multi-value field on the folder when it arrives and use that as a contents list to decide whether it's a duplicate file or not.

0 0
replied on May 24, 2017 Show version history

Thank you. I added lines to my wait condition so that documents will not move unless all document types are present in each employee's folder and the folders are free of duplicate documents. That has kept things from moving prematurely. Now I need to figure out how to make the workflow delete the duplicates in each folder. I tested out patterns with (2), (3), (4) at the end of file names and the result is the original file name.

0 0
You are not allowed to follow up in this post.

Sign in to reply to this post.