You are viewing limited content. For full access, please sign in.

Question

Question

Which Workflow configuration do you think will process faster?

asked on February 17 Show version history

I thought I would have a little fun today. I have a workflow that I need to build that will search the repository to find all documents that have been brought into Laserfiche for the last 60 days. It will retrieve 15 metadata field values for each document. It will then write the document information and the metadata to a CSV file locally on the Workflow server.

Which workflow configuration do you think will be faster?

Option #1

  1. Search the repository
  2. Write header row of CSV file
  3. For Each Entry write detail line to CSV file

 

Option #2

  1. Search the repository
  2. Set the header row as a token value
  3. For Each Entry write detail to multivalue token
  4. Write full value (header+detail) to CSV file

 

Option #3

  1. Search the repository
  2. Set the header row as a token value
  3. For Each Entry write details to a single string token
  4. Write full value (header+detail) to CSV file

 

The sampling I will be using is around 6,000 documents.

Let me know which option you think will process faster and why. I will post my results after I'm done.

1 0

Replies

replied on February 17

Wanna test another option? Search Repository produces an XML file with the results. You could save it to an entry in the repo, then download it in the script and convert it to XML with something like this.

3 0
replied on February 17

I'll see what I can put together tomorrow and test it.

0 0
replied two days ago

Looking at the tokens for the Search Repository task, how do you get access to the XML file?

0 0
replied two days ago

The results file is only available in activities that take a file like Attach Electronic Document. So you'd create an entry and attach it. Then in the script, download it.

1 0
replied two days ago

That is really cool. I never knew that the Search Repository activity output an XML file.

1 0
replied two days ago

Find Entries does too. 

0 0
replied on February 17 Show version history

Here are the results for 6,274 documents:

Option #1: 28m 48s (Search repository: 23s 380ms)

Option #2: 4m 51s (Search repository: 22s 570ms) 🏆

Option #3: 9m 50s (Search repository: 21s 390ms)

Not what I was expecting. I was expecting #3 to win.

Option #1:

Option #2:

Option #3:

Let me know if you want to see any modifications and we can do a round #2.

2 0
replied on February 17

Man, you posted the answers as I was reading the question. But I was going to throw my guess at #2.

Also, to be clear, I assume your search activity pulled back all of the relevant fields rather than having to query each result document individually. At least, I've always been told that's the most efficient way to do it.

0 0
replied on February 17

Yes, correct. The Search repository activity pulls back all relevant fields.

1 0
replied on February 17

I'm going to blame the fact that you used visual basic instead of c# and ignore any follow-up comments about individual activity times.

0 0
replied on February 17

Both languages are JIT compiled down at runtime to a common intermediate language so I wouldn't expect to see any difference in performance.  smiley

0 0
replied one day ago

And fwiw I had to look it up. Strings are generally immutable in memory so string concatenation actually creates now objects in memory each time. Arrays just build up their memory address space over time and are optimized for concat/append type calls. Thats why the MV token is faster

3 0
replied on February 17

Assuming you are scripting the CSV writes, I will go with option #2  because it contains fewer calls to the script activity.  I also think it makes more sense programmatically; gather all of the info and write it out all at once. 

However , the concerns for option 2 might outweigh the benefits; 1 - Making sure there is no limit on the number of values that can be written to a multi-value token, 2 - Understanding the server overhead for manipulating  large multi-value tokens, 3 - You are actually traversing the data twice, once to gather the token values, and again in the script activity to write the values out.

Interesting question!  (At least to us geeks)

Prizes for selecting the correct answer will be awarded at Empower?  ;-)

0 0
replied on February 17

Yes, to write to the CSV file I would be using a Script activity.

I like the idea of an award. I'll see what I can come up with.

0 0
replied on February 17

Do you even need a multi-value token? couldn't you just write it all to a string token and then pass that string at the end to the file

2 0
replied on February 17

I think we might have an Option #3.

1 0
replied on February 17 Show version history

I agree with Cliff and Zachary. Calling a script for each row would be less efficient since there's some extra overhead to running scripts.

I still like using a multi-value for tracking, but at the very least I'd flatten the content with line breaks then write it all at once with WriteAllLines in the script.

1 0
replied on February 17

Random thought on the single token option; 6,000 documents x 15 fields = 90,000 values.  If you assume an average value length of 20 you end up with a  token with a length of 1,800,000.   Also, that number doesn't include the row and column delimiters that you would have weave into the data.

0 0
replied on February 17

1. Don't track tokens in either option

2. Just script the whole thing :)

1 0
replied on February 17

Just what I was thinking. 

The search and the loop (for each entry) will be the biggest performance hit.

Search then blah 6000 rows at your csv.  

Can the script "search and blah"?

0 0
replied on February 17

I use option 3 for CSV reports.  I just did a test with 6,000 rows of data and 12 fields of data and it ran in less than 6 minutes.  Zac FTW!  

0 0
replied on February 17

Zac and Anthony, if you guys have scripts you want me to throw in the mix, supply them and I'll test them out.

0 0
replied on February 17

I haven't used self-hosted in years 😉, but if you're attending my empower class I might show you something related.

2 0
You are not allowed to follow up in this post.

Sign in to reply to this post.