You are viewing limited content. For full access, please sign in.

Question

Question

Regular Expression, Remove Words from Result Value

asked on January 29, 2020

Not having much luck in my research and I'm very new to regular expression, so I'll post the question.

I am OCRing Page 1 of a PO using QuickFields and putting it in Laserfiche. I then use Workflow to read the text and extract data out to be used as metadata (we don't own the version of QuickFields that has zonal OCR). This is what I'm using for my Pattern Matching (\W*(Purchase Order:)\W*(\S[0-9a-zA-Z-]*)) which helps me get my PO #, but how do I get rid of "Purchase Order:"?


Thought this post would help, but I couldn't get it to work:

https://answers.laserfiche.com/questions/161683/Regular-expression-to-exclude-all-instances-of-something#161712

0 0

Answer

SELECTED ANSWER
replied on January 30, 2020

Purchase Order: ([\w\-]*)\s*

This says "Once you see the block of text, 'Purchase order: ' capture 0 or more word characters* and any hyphens after it until either the text ends or until you hit a space. Don't capture that space, and stop capturing once you get to it"

*word characters are a-z, A-Z, 0-9, and _

If you have a spare hour to kill and are more the "learn by doing" type this is a great website with progressively difficult practical challenges: http://play.inginf.units.it/#/

And the site https://regexr.com/ is fantastic to test any other expressions you want to check

2 0

Replies

replied on January 29, 2020

Try this:

Purchase Order: ([.\-]*)

1 0
replied on January 29, 2020

Thanks! This worked for me:

Purchase Order:([.\-]*)\W*(\S[0-9a-zA-Z-]*)

1 0
replied on January 29, 2020

Awesome! If you want some help with regex this is a really good site to create test cases and visualize what's happening: https://regexr.com/ 

The only caveat is that LF handles capture groups differently... You can put parentheses around items in regexr and it will highlight the entire thing, whereas laserfiche will only "pluck out" what's in the parentheses.

By the way I had an error in my original, this might be simpler for you:

Purchase Order: (.*)

0 0
replied on January 29, 2020


How do I get rid of the text that follows? Anything after the next space should not be part of the result value.

1 0
SELECTED ANSWER
replied on January 30, 2020

Purchase Order: ([\w\-]*)\s*

This says "Once you see the block of text, 'Purchase order: ' capture 0 or more word characters* and any hyphens after it until either the text ends or until you hit a space. Don't capture that space, and stop capturing once you get to it"

*word characters are a-z, A-Z, 0-9, and _

If you have a spare hour to kill and are more the "learn by doing" type this is a great website with progressively difficult practical challenges: http://play.inginf.units.it/#/

And the site https://regexr.com/ is fantastic to test any other expressions you want to check

2 0
replied on January 30, 2020

Thank you so much for your patience and tutoring me! I'll find an hour to kill in order to be able to learn regular expressions.

0 0
You are not allowed to follow up in this post.

Sign in to reply to this post.