You are viewing limited content. For full access, please sign in.

Question

Question

using Retrieve Document Text produces the following error The source document contained no pages. [0752-WF0]

asked on September 9, 2024

I am processing a collection of PDF documents that were imported into the repository via import agent. I wasn't part of that effort, so i do not know the specifics used in importing. That said, I have about a 1000 pdfs that were not placed in the correct directory and I need to process them.  Our DIr structure is by Year-Month and i just need to put the docs in the correct folder (creating a folder using the case number as the name.) 

in order to get the date, i am trying to use the Retrieve Document Text workflow object and then use pattern matching on the text, however, I receive the error provided in subject when i hit the first doc.  I am able to open the pdf in the repository and it has 3 pages within it, so i am a bit baffled by this.  I assume there is a separate type of page it needs and from other posts, it might be that generate text was not selected, so this wont work.  If this is the case, please confirm or provide some guidance to resolving this issue.

0 0

Replies

replied on September 9, 2024

It looks like they were imported without generating pages or text.  You could use workflow and distributed computing cluster to generate the text.  I'd recommend doing it overnight.  Try a couple first before dumping a thousand documents on your DCC server.

2 0
replied on September 10, 2024

Having a workflow like this can be very useful. You can call it conditionally by doing a "Find Entry" activity with the additional property of "Page Count". If the page count is <1, run the workflow and wait for it to complete. Once the workflow completes, you will have pages. Problem solved.

 

0 0
You are not allowed to follow up in this post.

Sign in to reply to this post.