Question

Extracting PDF metadata

Workflow

Updated December 29, 2015

asked on August 20, 2015

Is it possible (or planned to be possible) to extract/query metadata stored within a PDF file using Workflow? I'm referring PDF metadata fields such as keywords, author etc and would prefer to use an standard activity rather than a customise.

We are looking at processing bank statements in PDF format and key information is stored within the PDF keywords field

I appreciate we can query form fields but the PDF do not contain form fields in this case. If the metadata is already contained electronically then it makes sense to query it directly rather than trying to read it using Quick Fields.

Thanks!

Nigel.

0 0

Answer

SELECTED ANSWER

replied on December 29, 2015

This feature has been implemented as the PDF Metadata activity in Workflow 10.

2 0

Replies

replied on August 20, 2015

Hi Nigel,

Thank you for the feature request. Do you have a sample PDF that we could use to see what type of information you want extracted?

Are your PDF's already in Laserfiche?

What will you do with the information once its extracted?

Do you have a use case for modifying the metadata?

Thanks!

0 0

replied on August 21, 2015

Hi Ed,

I've attached a sample PDF to which I've added some metadata in the PDF keywords field ("12345678") as below. I understand that this field is present in all PDF documents.

The PDF's will most likely already be in Laserfiche and processed by Workflow (or Quick Fields).

The use case is that a customer will be receiving large volumes of banking documents from a major bank in PDF format instead of paper. These PDFs will contain key information (like customer number, statement number) as delimited values in the keywords field and we are aiming to extract that information to populate template fields.

Without the ability to query the PDF fields directly then we would have to attempt to read this information from a scanned copy of the PDF which seems like a backward step given that the information is already stored electronically within the PDF metadata.

A sample of the metadata stored in the PDF is given below:

Thanks,

Nigel.

Sample.pdf (36.67 KB)

| Download

0 0

replied on August 21, 2015

Do you know what the delimiter would be?

0 0

replied on August 21, 2015

In the example I've read about (but don't have an actual PDF) it will be a semi colon but I understand it can be a comma (and usually is according to Adobe)

0 0

You are not allowed to follow up in this post.

Question

Question

Extracting PDF metadata

Answer

Replies

Sign in to reply to this post.