You are viewing limited content. For full access, please sign in.

Question

Question

Regex to Capture Amount Only with Decimal

asked on March 4, 2022

Hello, LF community!  I am trying to use regex with Connector to format a value from a zone OCR value.  Value sample possibilities are as follows:

Discount Amount:  .00  

Discount Amount:  100.00

Discount Amount:  10,000.00

I need to be able to return only the number with decimal point for entry into a metadata field in a LF template in the repository (which is formatted to currency).  I can successfully exclude the "Discount Amount:" term, but still end up with a leading whitespace before the number and the comma inside the number.  To achieve this result, I'm currently using:

(?:[^:]+)$

I'm sure I'm missing something simple, but I just can't figure this out!  I would greatly appreciate any help to return just the number and decimal point.

0 0

Answer

SELECTED ANSWER
replied on March 4, 2022

I believe something like this would do it, at least up to 999,999.99, which seems likely it's good enough for a discount :). 
(you could also add another "(\d{1,3})?,?" in front to capture up to 999 million)

Discount Amount:\s*?(\d{1,3})?,?(\d{1,3})?(\.\d\d)

But let's break it down a bit:
The first part, "Discount Amount:", is anchoring text and helps you verify you're capturing the correct number.  This is followed by \s*, which means "any amount of whitespace".

Then we have "(\d{1,3})?", which says there is an optional set of 1-3 digits that we want to capture.  This would be capturing the thousands block (e.g., values between 1,000.00 and 999,999.99).

And an optional comma (",?") follows that, which means that if the value is less than 1000, the pattern will still match.

And then this pattern repeats again for the "ones block" (e.g., values between 1.00 and 999.99).

Except that it doesn't need the optional trailing comma.

And then finally we capture the "cents block" which, importantly, captures the decimal point (so we don't accidentally turn 1.00 into 100).

1 0
replied on March 7, 2022

When copied and pasted the result value was blank; however, I was able to get it to properly display the numbers without using the anchoring text and the whitespace.  It is a zone OCR that contains only this anchoring text and one number, so I think it should work consistently.  Here is what worked for me:

(\d{1,3})?,?(\d{1,3})?(\.\d\d)

I certainly couldn't have figured this out without your assistance!  Thank you very much, Jacob!

0 0
replied on March 15, 2022

If you are looking to shorten the pattern up at all ([\d\,]*\.\d{2}) can be used as well.

And that can be shortened even further to ([\d\,\.]+) but it runs the risk of matching other items on the page.

1 0
replied on March 16, 2022

Thank you, Justin!  I will try this out.

0 0

Replies

You are not allowed to reply in this post.
You are not allowed to follow up in this post.

Sign in to reply to this post.