You are viewing limited content. For full access, please sign in.

Discussion

Discussion

Quick Fields: Page Removal task not working consistently

posted on July 20, 2022

I have a document that I'm using for testing; it has 17 blank pages in it.  I have the Page Removal task set up for each document classification, with a file size parameter set to <9000kb. 

Out of the 17 blank pages, 7 are not being deleted and are being placed into the Unidentified Documents location.  I checked the file sizes of the remaining blank pages and they are all 0.8kb, so I would think the file size shouldn't be the issue. 

Thoughts or advice?

Thanks.

 

0 0
replied on July 20, 2022

When you run the profile, what does it show in the output messages?

Going into unidentified suggests that they are not meeting the necessary criteria for any of your classifications.

0 0
replied on July 20, 2022

That's exactly what it shows; criteria was not met.  Possibly because I have some document classes set up to limit the number of pages in the Last Page Identification section.  I could remove that limitation and maybe that will get it working...

0 0
replied on July 20, 2022

Last page identification could be a factor. If your Last Page Identification is cutting the document off prematurely, then I believe the remaining pages would get processed as the start of a new document and may not meet any of the identification criteria.

You might see mention of that in the output as well if you scroll through and check the Messages for all those steps.

0 0
replied on July 21, 2022

Removing the last page identification resolves the issue, but presents another question.  If the second page of a document is something that I want to remove, it seems to me that QF should treat the second page removal request as something that should be done immediately and not run it through every doc class first.  Logically, at least to me, if the first page identification has identified the document, then the page removal task in the Last Page Identification should remove the requested page. 

0 0
replied on July 21, 2022 Show version history

I think some clarification is needed regarding first/last page identification.

Removing a page doesn't inherently mean that every page will be run through all of your document classifications. The reason your blank pages are being run through the classification is because they are being treated as a new document.

Last page identification means exactly that, the last page of the document, so if you set it up so it "completes" a document at the blank page, everything after that becomes a new document, hence the classification checks.

The logical reasoning behind this is that Quick Fields can be used to parse individual documents out of a consolidated file. For example, if you pull in a PDF that consists of multiple scanned invoices, the first/last page identification allows you to parse them out into separate documents when they're processed.

However, in your case, you just want to remove the blank items from the document, in which case you would want that to occur within the classification activities rather than splitting them off into a new document.

Instead of creating a new document by triggering Last Page Identification, try adding your page removal to the Page Processing or Post-Processing activity

Add the Page Removal activity, then set it to the parameters you want, which I believe you said was based on file size.

0 0
replied on July 21, 2022

That's what I'm now doing for the blank pages and it works fine.  I was trying to do something similar for some other documents that have a second page that is not necessary but has text (such as page 2 of a W4), so I specified removing page 2 in the Post-Processing page removal task.  Instead of removing page 2, however, it just continues on to the following doc classes trying to classify the page.

0 0
replied on July 21, 2022

Something else had to be going on as a result of Last Page Identification rather than the page removal.

I have several QF profiles that remove pages and I've never had it try to process the removed page as a separate classification.

Post-Processing is the very last thing to happen, so it sounds your "extra" page is getting split off and processed separately before it even reaches the "Page Removal" task.

That has to be the case because Page Removal is literally just deleting the page so it would never trigger a new document classification.

0 0
replied on July 21, 2022

The blank pages continue to go through doc class evaluation too, although they do end up getting deleted.  My Last Page Identification is set to "When the first page identification conditions are satisfied".

0 0
replied on July 21, 2022 Show version history

What do you mean when you say they still go through doc class evaluation?

Are they being evaluated as a NEW document, or are the simply going through all the checks associated with the activities WITHIN the classification?

If you're deleting the pages in Post-Processing then it makes perfect sense for the page to still go through every check in the classification because you're telling it not to delete anything until after it checks the entire document.

If you want a page to be removed immediately, then you should remove pages during Page Processing.

 

For example, if you tell it to remove Page 2 in Post-Processing, but anything before that limits the document to 1 page, then by the time it gets to Post-Processing it already thinks "page 2" is the start of a new document so it would run it through a new classification process and there wouldn't be a page 2 to remove.

 

It's hard to say exactly what's going on without knowing the entire process or how your classification is configured.

You can tell a lot more about what is happening by looking at the output messages because that will tell you every activity that's being touched, in what order, and the results of any conditional evaluations.

1 0
replied on July 21, 2022

Going through all document classes for identification is expected behavior for your selected option. Once Quick Fields finds the first page of a new document, every subsequent page needs to be evaluated to see whether it's part of the currently open doc or a new one or an unidentified page. So it will go through first page identification in each doc class in order to see if it matches any of those. If yes, Quick Fields will close the current doc and make a new one from that page and start the process again with next page. And so on. If none of the identification conditions match, then it needs to decide whether it belongs in the current doc or in Unidentified.

It's hard to guess without seeing the session, but likely the white page is being processed when there is no open document, so it ends up as unidentified.

2 0
You are not allowed to follow up in this post.

Sign in to reply to this post.