You are viewing limited content. For full access, please sign in.

Question

Question

Merging Docs with same metadata using Workflow

asked on March 19, 2015

So I know how to create a Workflow to merge docs when a new document is created containing the same metadata...however, I'd like to create a Workflow for documents that are already in the Repository that contain the same metadata and have them merged.  So my situation is to do a "Retrieve Field Values", then Search Repository for DocID (use Retrieve field values DocID) to find all documents with the same DocID, then I want those documents to be merged together into one document.  The metadata for each document I find is exact, so the metadata shouldn't be an issue.  I'm however, having a tough tough time because when I perform the search, it finds all the documents, but I can't seem to figure out a way to tell the "Move Pages" which document to move "from" and "to"...to make the merge happen.  Any insight would be appreciated.

0 0

Replies

replied on March 19, 2015

Daryl - Off of the top of my head couldn't you step through the search results and the document with the earliest creation date would be the target and the pages of all other docs would be moved to that target doc?  I would envision a For Each up in front of the workflow looking at creation dates that would return the unique DocID for the target document.  Does that make sense?

0 0
replied on March 19, 2015 Show version history

Hi Daryl,

 

High level, you would set it up like this....

 

 

You would need to setup a "For Each Entry" loop to iterate through the search results and performs the move. Then setup the move pages to 'current entry'. yes

0 0
replied on March 19, 2015

My issue is that there's an import happening, so therefore, the documents could have the same create date.

0 0
replied on March 19, 2015

Hmmm, the Creation Date field is actually a Date/Time that is accurate to the whole second.  So I was envisioning the 'oldest' document to be the target and subsequent document pages would be added to that target in document Creation Date/Time ascending order.  (Even if the 'oldest' document was only seconds older than the others)  Is that not the case?

0 0
replied on March 20, 2015

Why don't you sort by Entry ID? That would be concurrent?

0 0
replied on March 20, 2015

What do you mean Sort by entry ID?  Can you give me an example?

0 0
replied on March 21, 2015

In a nut shell, when you do a mass import, the date/time stamp can sometimes mean that 2 or more documents are imported at the same time (even down the second) and therefore using the creation date as a method of sorting search results can't always be the best method to use. What I am saying is to sort by the entry ID instead because this will always be in order regardless of if the document has been created in the same second as another.

 

In this example what you could do, is use a workflow with search function to find all the documents you want to deal with and write this data to a temporary SQL table. Then have another workflow iterate through those rows in the table and write a flag or purge the row when the job is done. Writing a flag (handled date/time or something) might be better as I've had problems with table locks when trying to purge rows using this method. This way you also have a table containing the audit history of what you have done and you can review history if needed.

0 0
replied on March 21, 2015

Daryl - I agree with Chris in that the EntryID will always be unique and in a creation ascending order so that would probably be the best key to use. 

Not sure if this is what you are looking for but this seems to work on my test system. 

Basically I am creating a multivalue integer list token to store a list of EntryID's, searching the repository for all appropriate documents, stepping through the search results and adding each document EntryID to the multivalue list, then sorting that multivalue list in ascending order, creating a new document and then stepping through the sorted multivalue list, finding the document based on its EntryID, then moving those pages to the new document.  The result is a single document with the pages from multiple documents in order from lowest EntryID to highest EntryID.

The key is the 'Assign Token Value' activity where the sorting of the EntryIDs takes place.  Here is a screen shot;

 

I can think of other ways to accomplish the same result as above but it seems to work as anticipated and uses all native workflow activities (no scripting or writing to external tables)

If this looks promising for you and you have additional questions let me know...

0 0
replied on March 23, 2015 Show version history

Just a note about the Entry ID vrs Creation Date.  If you are bringing any of these documents into the repository through a Briefcase from another repository, they will retain the creation date from the original repository while being assigned a new entry ID in the new repository that could be higher than documents with newer creation dates.

1 0
replied on March 23, 2015

Another good point Bert! Which is why it's probably a better idea to use the Entry ID rather than creation date in situations like this.

0 0
You are not allowed to follow up in this post.

Sign in to reply to this post.