So I know how to create a Workflow to merge docs when a new document is created containing the same metadata...however, I'd like to create a Workflow for documents that are already in the Repository that contain the same metadata and have them merged. So my situation is to do a "Retrieve Field Values", then Search Repository for DocID (use Retrieve field values DocID) to find all documents with the same DocID, then I want those documents to be merged together into one document. The metadata for each document I find is exact, so the metadata shouldn't be an issue. I'm however, having a tough tough time because when I perform the search, it finds all the documents, but I can't seem to figure out a way to tell the "Move Pages" which document to move "from" and "to"...to make the merge happen. Any insight would be appreciated.
Question
Question
Replies
Daryl - Off of the top of my head couldn't you step through the search results and the document with the earliest creation date would be the target and the pages of all other docs would be moved to that target doc? I would envision a For Each up in front of the workflow looking at creation dates that would return the unique DocID for the target document. Does that make sense?
Hi Daryl,
High level, you would set it up like this....
You would need to setup a "For Each Entry" loop to iterate through the search results and performs the move. Then setup the move pages to 'current entry'.
My issue is that there's an import happening, so therefore, the documents could have the same create date.
Hmmm, the Creation Date field is actually a Date/Time that is accurate to the whole second. So I was envisioning the 'oldest' document to be the target and subsequent document pages would be added to that target in document Creation Date/Time ascending order. (Even if the 'oldest' document was only seconds older than the others) Is that not the case?
Why don't you sort by Entry ID? That would be concurrent?
What do you mean Sort by entry ID? Can you give me an example?
In a nut shell, when you do a mass import, the date/time stamp can sometimes mean that 2 or more documents are imported at the same time (even down the second) and therefore using the creation date as a method of sorting search results can't always be the best method to use. What I am saying is to sort by the entry ID instead because this will always be in order regardless of if the document has been created in the same second as another.
In this example what you could do, is use a workflow with search function to find all the documents you want to deal with and write this data to a temporary SQL table. Then have another workflow iterate through those rows in the table and write a flag or purge the row when the job is done. Writing a flag (handled date/time or something) might be better as I've had problems with table locks when trying to purge rows using this method. This way you also have a table containing the audit history of what you have done and you can review history if needed.
Daryl - I agree with Chris in that the EntryID will always be unique and in a creation ascending order so that would probably be the best key to use.
Not sure if this is what you are looking for but this seems to work on my test system.
Basically I am creating a multivalue integer list token to store a list of EntryID's, searching the repository for all appropriate documents, stepping through the search results and adding each document EntryID to the multivalue list, then sorting that multivalue list in ascending order, creating a new document and then stepping through the sorted multivalue list, finding the document based on its EntryID, then moving those pages to the new document. The result is a single document with the pages from multiple documents in order from lowest EntryID to highest EntryID.
The key is the 'Assign Token Value' activity where the sorting of the EntryIDs takes place. Here is a screen shot;
I can think of other ways to accomplish the same result as above but it seems to work as anticipated and uses all native workflow activities (no scripting or writing to external tables)
If this looks promising for you and you have additional questions let me know...
Just a note about the Entry ID vrs Creation Date. If you are bringing any of these documents into the repository through a Briefcase from another repository, they will retain the creation date from the original repository while being assigned a new entry ID in the new repository that could be higher than documents with newer creation dates.
Another good point Bert! Which is why it's probably a better idea to use the Entry ID rather than creation date in situations like this.