Corrupted Files - Identify and Tag with Workflow

asked one day ago

Good morning all,

I am trying to create a simple workflow to identify corrupt or password-protected PDFs so that I can tag them and exclude them from an OCR process. I have read this post RE the DCC cluster and pwd protected workflows, but am getting stuck drafting my workflow and was hoping folks could help.

My goal is to have a try-catch activity that checks the PDF name and tags it if there is an error, and that's working fine - my issue is finding all PDFs in my repository that have 0 pages and making them each available to the PDF metadata entry: when I do a "for each entry" and nest a try-catch with the "PDF Metadata" activity, the PDF metadata activity can't see the "for each entry" - so it only runs on the first search result. Is there an easier way to do this?

Any help would be much appreciated!

0 0

replied one day ago

I think the issue here is that there is a difference between the entry and the electronic document that is attached to the entry. The For Each Entry is looping through the list of entries but not the electronic files that are attached to those entries.

I haven't tested it - but I think you need the Download Electronic Document activity to download the electronic document that is on the current entry before you can read the metadata from that downloaded document.

1 0

replied 10 hours ago

That's exactly what it is. Search Repository produces a collection of entries, not of PDF files. You need to download the PDF with Download E-Doc and use that.

1 0

Question

Question

Corrupted Files - Identify and Tag with Workflow

Replies

Sign in to reply to this post.