You are viewing limited content. For full access, please sign in.

Question

Question

Updating template fields via SQL

asked on July 7, 2023

Hi,

We are on version 10.4 of Laserfiche. I am trying to update over 1million documents with a new template, so opted for doing it via SQL... (Of course standard disclaimer applies here - this is not really recommended, you should back-up before attempting anything like this, etc.)

I manage to do the update via SQL and I can see the assigned templates did change. I followed these steps:

  • Stop Laserfiche Server
  • Run SQL script
  • Start Laserfiche Server 

 

HOWEVER as soon as I open the metadata screen on a doc and close it, it will revert the changes made via SQL. I am thinking maybe it is the cache... but should that not be sorted because I stopped the LF service before I ran the SQL script?

Any advice will be greatly appreciated.

Many thanks

0 0

Answer

SELECTED ANSWER
replied on July 7, 2023

Just to note something with regard to long processing times in workflow, you can drastically increase processing times if you use invoked child workflows with iteration limits and make the updates in parallel batches.

Depending on how your workflow server is configured you can run multiple parallel processes without any performance degradation.

Basically,

  1. Create a workflow that runs a Search Repository activity but set it to only return a limited number of results because too many loop iterations will cause the instance to slow down over time (I usually opt for a max of 250).
  2. Set an input parameter that you can use to separate your batches
  3. Set an output parameter to return the result count of the search
  4. Configure a search for documents with the template you want to change and loop through to update the template.

To handle the batching, you could use advanced search syntax to search for not only the template, but an additional value so they can be grouped.

For example, maybe run them in 10 parallel batches based on the last digit of the entry Id with syntax like this

{[TemplateName]} & {LF:ID=*0}

Then you can use the input parameter instead of a static number so parallel instances can work on separate groups without overlap.

{[TemplateName]} & {LF:ID=*%(Input Parameter)}

Just make sure you don't send a blank/empty input parameter because that would cause undesirable results in the search.

Next, you create the "parent" workflow

  1. Create a workflow with multiple branches for the parallel batches
  2. Add a Repeat activity to each branch
  3. Add an Invoke Workflow activity inside of the repeats
    1. Set the input parameter for each distinct branch (for example, if you use my earlier example you could do 10 branches passing in 0-9 as the input)
    2. Make sure "wait for workflow" is checked so it waits for the results and gets the output value
  4. Set the Repeat activity to check after each iteration and set it to repeat as long as the output of the invoked workflow (i.e., how many results were found in the search) is greater than 0.

 

The reason most workflows like that take so long is that the more iterations you have, the more data is collected, and eventually it starts to impact the instance. For example, you might start with it taking milliseconds per document but after 1,000+ iterations it might take seconds or even minutes each.

By breaking it up into child workflows you can limit how much is handled by each instance to avoid the performance degradation, and by breaking it into parallel branches you can increase throughput significantly.

There's no "hard limit" on how many iterations will cause performance to slow down because it depends what activities are being run and how much data is accumulated, but as a general rule-of-thumb I try to limit things to no more than 250 iterations per instance.

3 0

Replies

replied on July 7, 2023

It's not possible to guess what the issue is without seeing your SQL script.

However, the disclaimer is: direct modifications of the SQL database are not supported.

1 0
replied on July 7, 2023

Hi Miruna,

Update script is as below (only testing on one document at the moment).
After some more testing - what actually triggers the reverting of the template is the following scenario:
If you open the "Metadata..." window in the desktop client and then change one of the template field values to something else BUT then cancel without saving the update it will trigger a revert to the old template. This does not happen in the web client.

 

/******************************************************/
--Update doc to the new template id
UPDATE [dbo].[toc]
   SET [pset_id] = 259
 WHERE tocid = 3804945
GO

/******************************************************/
--Delete any current fields that are not part of the new template
 DELETE FROM [dbo].[propval]
  WHERE tocid = 3804945
  AND prop_id NOT IN (13, 14)

/******************************************************/
-- The new template's fields are actually new fields completely
-- Although named differently in the new template the types are consistent with the current fields
/******************************************************/
-- Update the fields ids in the template, to point to the new fields
-- prop_id 13 (used in the old template) is mapped to prop_id 303 in the new template
UPDATE [dbo].[propval]
   SET [prop_id] = 303
 WHERE [tocid] = 3804945
 AND prop_id = 13

-- prop_id 14 (used in the old template) is mapped to prop_id 305 in the new template
UPDATE [dbo].[propval]
   SET [prop_id] = 305
 WHERE [tocid] = 3804945
 AND prop_id = 14

 

Fully appreciate that it is not supported, but trying to find a way of doing this that won't take weeks. I've initially done a workflow to update the templates, but that literally runs for days, so looking for better alternatives.

Many thanks

 

0 0
SELECTED ANSWER
replied on July 7, 2023

Just to note something with regard to long processing times in workflow, you can drastically increase processing times if you use invoked child workflows with iteration limits and make the updates in parallel batches.

Depending on how your workflow server is configured you can run multiple parallel processes without any performance degradation.

Basically,

  1. Create a workflow that runs a Search Repository activity but set it to only return a limited number of results because too many loop iterations will cause the instance to slow down over time (I usually opt for a max of 250).
  2. Set an input parameter that you can use to separate your batches
  3. Set an output parameter to return the result count of the search
  4. Configure a search for documents with the template you want to change and loop through to update the template.

To handle the batching, you could use advanced search syntax to search for not only the template, but an additional value so they can be grouped.

For example, maybe run them in 10 parallel batches based on the last digit of the entry Id with syntax like this

{[TemplateName]} & {LF:ID=*0}

Then you can use the input parameter instead of a static number so parallel instances can work on separate groups without overlap.

{[TemplateName]} & {LF:ID=*%(Input Parameter)}

Just make sure you don't send a blank/empty input parameter because that would cause undesirable results in the search.

Next, you create the "parent" workflow

  1. Create a workflow with multiple branches for the parallel batches
  2. Add a Repeat activity to each branch
  3. Add an Invoke Workflow activity inside of the repeats
    1. Set the input parameter for each distinct branch (for example, if you use my earlier example you could do 10 branches passing in 0-9 as the input)
    2. Make sure "wait for workflow" is checked so it waits for the results and gets the output value
  4. Set the Repeat activity to check after each iteration and set it to repeat as long as the output of the invoked workflow (i.e., how many results were found in the search) is greater than 0.

 

The reason most workflows like that take so long is that the more iterations you have, the more data is collected, and eventually it starts to impact the instance. For example, you might start with it taking milliseconds per document but after 1,000+ iterations it might take seconds or even minutes each.

By breaking it up into child workflows you can limit how much is handled by each instance to avoid the performance degradation, and by breaking it into parallel branches you can increase throughput significantly.

There's no "hard limit" on how many iterations will cause performance to slow down because it depends what activities are being run and how much data is accumulated, but as a general rule-of-thumb I try to limit things to no more than 250 iterations per instance.

3 0
replied on July 10, 2023

Thank you Jason, this makes sense. Let me give it a try and see if I can get it working this way.

0 0
replied on July 10, 2023

It looks like your documents are under version control and your SQL query is not updating those tables.

2 0
replied on July 17, 2023

Hi Jason,

Got this working and you are correct - it makes it significantly faster if you do it in parallel batches (got it down from 5 hours to 1 hour on my testing subset!)

I also don't need to worry about accidently breaking something, with an unsupported database update, if I do it this way, so will go with this approach.

Thanks again! :)

0 0
You are not allowed to follow up in this post.

Sign in to reply to this post.