Merging Docs with same metadata using Workflow

asked on March 19, 2015

So I know how to create a Workflow to merge docs when a new document is created containing the same metadata...however, I'd like to create a Workflow for documents that are already in the Repository that contain the same metadata and have them merged. So my situation is to do a "Retrieve Field Values", then Search Repository for DocID (use Retrieve field values DocID) to find all documents with the same DocID, then I want those documents to be merged together into one document. The metadata for each document I find is exact, so the metadata shouldn't be an issue. I'm however, having a tough tough time because when I perform the search, it finds all the documents, but I can't seem to figure out a way to tell the "Move Pages" which document to move "from" and "to"...to make the merge happen. Any insight would be appreciated.

0 0

replied on March 19, 2015 • Show version history

Hi Daryl,

High level, you would set it up like this....

You would need to setup a "For Each Entry" loop to iterate through the search results and performs the move. Then setup the move pages to 'current entry'.

0 0

View 5 previous replies

replied on March 19, 2015

My issue is that there's an import happening, so therefore, the documents could have the same create date.

0 0

replied on March 19, 2015

Hmmm, the Creation Date field is actually a Date/Time that is accurate to the whole second. So I was envisioning the 'oldest' document to be the target and subsequent document pages would be added to that target in document Creation Date/Time ascending order. (Even if the 'oldest' document was only seconds older than the others) Is that not the case?

0 0

replied on March 20, 2015

Why don't you sort by Entry ID? That would be concurrent?

0 0

replied on March 20, 2015

What do you mean Sort by entry ID? Can you give me an example?

0 0

replied on March 21, 2015

In a nut shell, when you do a mass import, the date/time stamp can sometimes mean that 2 or more documents are imported at the same time (even down the second) and therefore using the creation date as a method of sorting search results can't always be the best method to use. What I am saying is to sort by the entry ID instead because this will always be in order regardless of if the document has been created in the same second as another.

In this example what you could do, is use a workflow with search function to find all the documents you want to deal with and write this data to a temporary SQL table. Then have another workflow iterate through those rows in the table and write a flag or purge the row when the job is done. Writing a flag (handled date/time or something) might be better as I've had problems with table locks when trying to purge rows using this method. This way you also have a table containing the audit history of what you have done and you can review history if needed.

0 0

replied on March 21, 2015

Daryl - I agree with Chris in that the EntryID will always be unique and in a creation ascending order so that would probably be the best key to use.

Not sure if this is what you are looking for but this seems to work on my test system.

Basically I am creating a multivalue integer list token to store a list of EntryID's, searching the repository for all appropriate documents, stepping through the search results and adding each document EntryID to the multivalue list, then sorting that multivalue list in ascending order, creating a new document and then stepping through the sorted multivalue list, finding the document based on its EntryID, then moving those pages to the new document. The result is a single document with the pages from multiple documents in order from lowest EntryID to highest EntryID.

The key is the 'Assign Token Value' activity where the sorting of the EntryIDs takes place. Here is a screen shot;

I can think of other ways to accomplish the same result as above but it seems to work as anticipated and uses all native workflow activities (no scripting or writing to external tables)

If this looks promising for you and you have additional questions let me know...

0 0

replied on March 23, 2015 • Show version history

Just a note about the Entry ID vrs Creation Date. If you are bringing any of these documents into the repository through a Briefcase from another repository, they will retain the creation date from the original repository while being assigned a new entry ID in the new repository that could be higher than documents with newer creation dates.

1 0

replied on March 23, 2015

Another good point Bert! Which is why it's probably a better idea to use the Entry ID rather than creation date in situations like this.

0 0

Question

Question

Merging Docs with same metadata using Workflow

Replies

Sign in to reply to this post.