You are viewing limited content. For full access, please sign in.

Discussion

Discussion

Extract the title from a file

posted on October 3, 2024

Hi, I have thousand of PDFs. the format are all the same for each pdf. Example:

John Doe Smith 123456.pdf

My goal here is to extract only numbers to put straight into a metadata field. How can I get this accomplish? I can use Workflow or Quick Fields for this project

0 0
replied on October 3, 2024 Show version history

I would suggest using the pattern matching workflow activity to extract those numbers. Pattern matching uses regular expressions (regex).

 

A couple of assumptions:

1. I'm assuming that the name of the document (Entry Name) in your example is John Doe Smith 12345.pdf

2. The regex I used is .*\s+(\d+).pdf - highly depending on the actual naming of the files you may need to adjust the regex a bit. You can also string regex's together using a pipe |.

Example: .*\s+(\d+).pdf|.*\s+(\d+).tif

Here is an example of multiple regex's being evaluated in a single regex:

Then add the Assign field workflow activity and assign the pattern-matching token to your desired field.

Good luck!

3 0
You are not allowed to follow up in this post.

Sign in to reply to this post.