You are viewing limited content. For full access, please sign in.

Question

Question

Best approach to bulk remove tags

asked on February 24, 2016

Hi all,

I have a retired workflow that has tagged documents across the repository with one or more of a set created specifically for the purpose that is now redundant.

There are some 25K documents tagged with one or more of these.

I want to completely remove these tags from the system and expect a short workflow that runs once at night would do the trick.

I have search syntax that returns them all like this:

(({LF:Tags="xl"} | {LF:Tags="y"} | {LF:Tags="z"} | {LF:Tags="a"}) & {LF:LOOKIN="RepoRoot\"}) - {LF:LOOKIN="RepoRoot\archive"}

So should I simply retrieve all the entries, remove all tags and end?

Given the number of entries, should I split it up or just let it run in the wee hours and get it over with?

Finally, can I then delete the redundant tags in the admin safely once and for all?

Cheers,

W

 

 

Screen Shot 2016-02-25 at 11.38.25 AM.png
0 0

Answer

SELECTED ANSWER
replied on February 24, 2016

That's not going to be short if it has to run 25,000 iterations. wink The End WF activity before the end of the workflow is redundant.

It will run faster if you break it up in chunks of about 1000 search results by limiting the number of results returned in the first activity. You can run it on a schedule, spaced out an hour or so apart. That wait the workflow instance would track less data. Then move the Remove Tags activity into its own workflow that's invoked inside the For Each loop. (I realize it probably sounds counter-intuitive to make something more efficient by making it do more, but this would be taking advantage of the parallel processing capabilities of Workflow, so rather than one instance doing all the work, it would act more like a coordinator and "outsource" the work to multiple parallel instances)

1 0

Replies

replied on February 24, 2016

I get it - so by retrieving only the first 1000 of entry ids that are tagged the ones that have been done by the previous iterations of the workflow won't be picked up and don't have to be considered.  It will recurse until the workflow is finding nothing to do.  I'll set that up now.  Thanks for the guidance and inspiration so quickly!

W

0 0
replied on February 25, 2016 Show version history

I have some questions arising from trying to implement this with the Tag Removal in its own workflow that is invoked by each iteration of a "for each entry" loop initialised with <=1000 entries returned from the original search activity.

1. I have never invoked an external workflow before and I've been reading all the documentation to try and understand how the invoked Input Parameter get's setup for this.

Initially I created a simple workflow that simply has the activity Assign Tags set to remove the tags.

I setup a token in the Input/Output Parameters - called it UnTag, no default with tags "String,Document".

When I go to assign an entry id to the Assign Tags activity I don't see how I can specify it because I have the option of Starting Entry or Other Activity and there are no other activities that I can select that call upon the UnTag Input Parameter as the EntryID of the target document.

When I publish the mini-workflow (without setting a schedule or condition) and then refresh the parameters in the Main Workflow Iteration of the For Loop I have been able to see and select the UnTag Input Parameter no problem and assigned it the current For-Loop Iteration's Current Entry ID.  So far so good.

I can, however, see that there is a selection at the bottom of that Invoke Workflow Activity:

Starting Entry -

where I have specified:

Other Entry: "Current Entry" of the Current Loop Activity...

I have not set the workflow to wait for the Invoked Workflow to complete as that seems to defeat the purpose of leveraging the multi-tasking/parallel processing.

 

2) So when I went reading I noted that there was a comment in the admin guide:

https://www.laserfiche.com/support/webhelp/Laserfiche/10/en-us/administration/Default.htm#../Subsystems/LFWorkflow/Content/Resources/Activities/Invoke Workflow.htm%3FTocPath%3DWorkflow%7CWorkflow%2520Designer%7CBuilding%2520Workflows%2520and%2520Business%2520Processes%7CActivities%7CH-R%7CInvoke%2520Workflow%7C_____0

"Note: In order to help prevent runaway workflows, more than 32 Invoke Workflow activities cannot be strung together."

 

So, I'm wondering what will happen when I let my workflow loose...  1000 iterations, every hour and no other sophistication in how to handle the conditional decision other than:

 

3) If I run it every hour and even if it spawns 1000 external iterations of the Invoked Workflow within that hour it will take over 25 Hours to complete.  So I figured maybe I could run it every half hour at night when nothing else is going down?

 

Then I stopped and though... I better just get this checked out by the pros...

 

Here is the workflow - I've tried to be as explicit as possible without exposing all the parameters due to client privacy.

 

The details I have for the UnTagging Workflow Activity are:

And the UnTagging Workflow looks like:

With the Following setup:

The Input/Output Parameters looks like this:

I'm too unsure to start trying anything out - so I'll put it off and wait till I've discussed this thoroughly with you - and than you very much for the assistance.

Is the Input Parameter setting only a formality as you set the starting condition by identifying the current entry of the for loop when you invoke the workflow?

Cheers

W

0 0
replied on February 29, 2016

Bump, can someone please have a look at my further question above and point me in the right direction?  I know it's a lot of info but essentially I think it is a simple conceptual step I'm not grasping.

W

0 0
replied on August 1, 2018

I actually have run in to a similar situation and wanting to know how to schedule a workflow and have that one spawn others when no starting entry is declared (since its first iteration is scheduled).

0 0
You are not allowed to follow up in this post.

Sign in to reply to this post.