Raymond,
Is there any way to determine if there is a password on a document?
Currently the system is setup like this:
1. Customer gets emails with pdf's attached. Occasionally lawyers that send these PDF's have encrypted the documents on them. Most of them are not.
2. The customer drags these PDF's (generally without opening them) to a windows share that is monitored by import agent. These documents have their pages extracted, but import agent cannot extract and OCR them at the same time so we do this at a later step.
3. A process OCR's all of the documents. Right now we use a QF agent session to do this but may be transitioning this to distributed OCR.
4. A 2nd Quick Fields session runs on the now OCR'd documents to identify forms. We keep these separated as we also keep a copy of the original packet so that users can refer back to it if not all forms are separated and/or identified. Occasionally some of these form identification and extraction processes required zone OCR, which works better if the document is converted to a tiff.
So our big issue is what happens with PDF's that are encrypted or have redaction annotations applied over portions of pages on the documents. If the entire PDF is locked it generally goes into the IAError folder. If there is an annotation that is blocking part of a page (say something placed over a SS#) a lot of times the page generation works (including what's underneath that annotation!) but when this happens the page that was generated has some corruption on it causing the page to fail with OCRing on that page. The text for this page will either be blank or it will have a very small section of the page OCRd.
This causes issues because then the identification of forms fails.
So at the end, having some sort of method where we can tell if this is a special document and being able to separate these for someone to manually reprint in or OCR by hand (so they get the password prompt) would be very helpful.