You are viewing limited content. For full access, please sign in.

Question

Question

Workflow DCC Schedule PDF Page Generation for pdf results in a tif with a missing page

asked on October 8 Show version history

We have a workflow that uses "Schedule PDF Page Generation" and our Distrubuted Computing Cluster to generate pages that are then imported into a placeholder document.  It recently came to light during testing that although the pages were being generated and a new tif document was created, one of the pages in the pdf was not being generated.  This is a concern as we need to be confident that all the pages present in the pdf, are also present in the tif.  

Has anyone encountered this before?  Any help would be greatly appreciated.

I have played with all the settings in the "Schedule PDF Page Generation" tool in Workflow, and page 2 will not be generated for this particular file.  

This is a 7 page document.  We have many that are well over 100.  This process is automated.  We need to be sure all pages are being generated.

Any assistance would be appreciated.

Thank you,

Christine

Additional info - when extract the text from each page is selected.....

The text for the missing page is present in the text field in Laserfiche, but there is no image (page) for that text.  

1 0

Replies

replied on October 8

It looks like maybe you are trying to schedule the page generation and then immediately start moving pages and assigning field values, but that is not how the activity works.

When you send a document to the DCC for OCR or Page Generation, it is an asynchronous scheduled task meaning Workflow moves on without waiting for the DCC to do anything other than confirm that the task was received and scheduled.

If you attempt to do something with that document based on an entry change event, there's a good chance you're grabbing it before the DCC has finished and only getting the first page(s).

If you want to do something after pages have been generated, you should use the advanced settings to set a callback workflow that will run once the DCC is actually finished generating pages.

There is also a callback workflow for failure so you can review errors, but I suggest reading through the documentation closely because the output is very different between the two callbacks.

Schedule PDF Page Generation (laserfiche.com)

4 0
replied on October 9 Show version history

Hey Jason,
Thank you for your response.  I have been doing some testing.  It appears that a call back workflow would require a whole new workflow to be started, rather than picking up where the main workflow left off (after the Page Generation step).  I really need to keep the Business Process Variables that were tokenized at the "Retrieve Business Process Variables" step, so did not want to end the workflow and start a new one.  So, I am playing with the "Delay" option.  I have set the delay (after the page generation step) to 5 minutes.  I checked in the DCC Admin Console and it shows the scheduled task for this pdf (page generation) was completed in 9 seconds successfully.  After the 5 minute delay (to give the DCC time to generate pages) I have the process continue and complete in workflow.  

Still only 6 pages are generated, but the text for the missing page (page 2 in the pdf) shows up in the text pane, but no corresponding page image has been generated.

From Laserfiche Web Admin for DCC:

I reviewed the information in the help files indicated.  Please let me know if my thought process about the delaying the workflow (5 Min), while we give the DCC time to complete page generation is way off base.  

As always, I truly appreciate any advice.

Thanks,

Christine

0 0
replied on October 9 Show version history

A delay will never be 100% reliable; the DCC is meant to offload time-intensive and resource-intensive operations so it could be 5 minutes on average, but you could easily have some that take much longer, or fail.

If there is a backlog of scheduled jobs, limited resources, etc., those could all delay the completion of the scheduled jobs, so the callback is the only sure way to know it is done.

Before they added the callback options, I had to use a delay approach to check for OCR results, but the only reason that (sorta) worked is I knew how many pages needed OCR, and as a fallback would have it retry if the entry hadn't been modified an extremely long time.

With page generation, there's no way to automatically check how many pages should've been generated from a PDF, or if any errors occur, so you can't really know if it is done by waiting alone.

I switched everything over to use callbacks literally as soon as those options became available because the other approach required a ton of upkeep and far more complex logic/checks, and it was still shaky at best.

1 0
replied on October 9

The delay shouldn't be needed. Your original workflow already had a wait condition which, I assume, was meant to wait for pages to be added. That should've been enough. Moving all activities after Wait for Entry Change/Delay into their own workflow that gets triggered when DCC completes is another way to go, but I don't think it will solve this particular problem. This sounds specific to extracting image pages from this specific document.

If you import this document into the Windows Client and generate pages, do you get the first page?

Since this is a document-specific issue, contacting your Solution Provider to open a case with Laserfiche Support might be the best course of action

2 0
replied on October 9 Show version history

Thanks Jason, 
The workflow this is a part of, does a lot based on Business process variables.  Can those be transferred from one Workflow to another?

Christine

0 0
replied on October 9 Show version history

I think Miruna is right, you should probably run a few more tests outside of that workflow process first to rule out any issues with the document.

@████████ it's been my experience that DCC saves progress as it goes. The Last Modified date seems to change periodically as it processes each page, which is what led me to believe the Wait for Entry Change activity was firing "early" in the workflow.

1 0
replied on October 9

Hello Miruna,
When I import the pdf into the Windows client and generate pages, all 7 pages are generated. 

The pdf and the generated pages match.

Thanks,

Christine

0 0
replied on October 9 Show version history

Hi Christine,

Next you should try running a workflow that schedules PDF generation, but nothing else, then wait to see what happens; just don't try to edit the document in the interim otherwise it might lock the entry and the DCC job could fail.

The Client and DCC use different PDF libraries so a file that works in one may not necessarily work in the other.

In general, the library used for the DCC is much more reliable, so if things take too long or don't work at all you should check the DCC job history for errors.

1 0
replied on October 9

Hey Jason and Miruna,

I created a short workflow to generate pages and move them into a placeholder document.  To prepare to run this, I imported a copy of the pdf into Laserfiche without generating pages or ocr'ing.  I then grabbed the Entry ID and used that to start the workflow.  The resulting tif had all 7 pages, matching the source pdf.

Here is my workflow.

The page generation completed in 14 seconds.

Christine

0 0
replied on October 9 Show version history

Any chance you have Audit Trail and it's set up to document creation and adding pages? That might be the easiest way to track if entry 139976 got 6 pages or 7 from DCC. If that's not available, we might be able to might be able track it down from the Subscriber Trace logs, but that's a little bit more involved if the repository is busy.

I would expect one modification from DCC sending all image pages at the same time.

Might be worth checking if there were other workflows that ran on that entry. Maybe another instance sliced off the first page before Move Pages got to it?

0 0
replied on October 18

Hello,

An udpate, I tried the process on a 20 page pdf, and only 10 pages were placed in the tif document (tif version created as part of the workflow process). 

The pdf pages were numbered at the bottom of each page.  1-20.  In the newly generated tif, all the even pages were missing.

The original pdf in Laserfiche (that is also saved in Laserfiche as part of the process) has all 20.  I can delete the electronic document, and I am left with a 20 page tif.  Why would all 20 of those images not get moved into the placeholder tif in my workflow?

I am sure there is a setting somewhere I must be missing.

Christine

0 0
replied on October 21

Hello again,

After talking with our VAR (thanks Tom!) it seemed the page loss is due to a script we are using to flatten a date stamp into the tiff.  We only needed the date stamp on the first page, so I have set up the workflow to move only page 1 of the pdf into a "Create Entry" tiff, run the date stamp script, then farther down in the workflow added another move pages activity but this time only moved pages after page 1(2-) into the newly created tiff entry that already contains the date stamped first page.

No missing pages!

Wanted to share in case it helps someone else.

There is probably another way to do this, but this seems to be one way.  I am going to test some more, but am feeling pretty hopeful about this option!

Christine

0 0
You are not allowed to follow up in this post.

Sign in to reply to this post.