You are viewing limited content. For full access, please sign in.

Question

Question

Pattern Matching in QFs

asked on April 4

Good afternoon everyone,

 

Instead of using OmniPage PCR for 1700 single page documents, I'm trying to be a little more efficient and use Pattern Matching.  I have the right idea and as long as the data I'm trying to grab is only one word, it works like a charm, but if multiple, it only grabs the first word.  

We're trying to capture data like First Name, Last Name, and Department.  So for the department Information Technology, I'm only grabbing Information, and we know not all users have one word first or last names.

Department: \s*(\w+)\s* 

Any help or guidance on this would be greatly appreciated.

Thanks!

0 0

Answer

SELECTED ANSWER
replied on April 12

Disregard....I was able to find what I was looking for by using the Reg Ex below.  Thanks again to everyone who offered to help.

 

Reg Ex:       Department:\s+(.*?)\s+Department:

0 0

Replies

replied on April 4

It would be pretty tough to write regex for this without an example of the text you're trying to extract from. That said, your easiest solution would be to plug in some sample text to ChatGPT, explain what you want the regex to do, and provide an example of the desired output. It will answer with surprising accuracy in most cases.

0 0
replied on April 8

Thanks for responding, Mr. @████████.  I've used ChatGPT for other stuff, but did not think about that this time.  I'll give that a shot and try to remember that going forward.

Thanks again!

0 0
replied on April 8

One word of caution here: check your company policy on AI tools before pasting in parts of your company documents as sample text.

0 0
replied on April 10

Thanks for the advise, @████████.  I'm only looking to grab one section of one line of text.  I've figured the rest out.  Below is an example, and I'm trying to get the first instance of  "BOARD OF ELECTIONS".  Other examples would look the same, except "BOARD OF ELECTIONS" might be "INFORMATION TECHNOLOGY" or "911 COMMUNICATIONS".  Thanks!

 

Example: "Department:     BOARD OF ELECTIONS                   Department:     BOARD OF ELECTIONS"

0 0
replied on April 10 Show version history

Once you have all the Departments in your PaternMatch multivalue token, then assign the values to a multivalue token using the Remove Duplicates function in the token editor.

Note: you may have to loop through each department value in the PatternMatch tocken to trim them before removing duplicates.

**********************************************************

Sorry, you had stated QF and I gave you Workflow...  In QF, besides you original Pattern Match, add 4 more Pattern Match activities below the first.

In the 2nd Pattern Match, first, apply the Trim function and then apply the Index to return all values separated by the Bar (|) and a pattern of ^\s*(\S.*\S)\s*$ to return all the departments as a single string separated by the Bar.

Note that I am assuming your token from the first Pattern Match that gets all the Department lines is called pmDepartment

In the 3rd Pattern Match, use the Pattern of ([^|]*) to split the single line token back into a multivalue token.

In the 4th Pattern Match, first, apply the Index to return all values separated by the Bar (|) and then apply the Remove Duplicates function and a pattern of (.*) to return all the trimmed departments as a single string separated by the Bar.

In the 5th (final) Pattern Match, use the Pattern of ([^|]*) to split the single line token back into a multivalue token.

 

Important:  The order of the Function and Index is critical.  You may have to manually arrange them after the functions/index is added.  Note the syntax in the screenshots.

0 0
replied on April 12

@████████, I appreciate you taking the time to include all these steps.  I haven't tried it yet, but just wanted to be sure we were on the same page.

Are all these steps necessary when looking for the first group of text between Department: and Department: in my line of text above?

Thanks

0 0
replied on April 12

Ahhh.  I thought you want all the departments listed that where on the page, but without duplicates.

0 0
You are not allowed to follow up in this post.

Sign in to reply to this post.