You are viewing limited content. For full access, please sign in.

Question

Question

Pattern Matching Layout

asked on July 28, 2021

I'm trying  to get the 10.50. Does Pattern Matching recognize this type of layout?

Check
Amt
$10.50

I know it does work for this type of layout

Check Amt $10.50

0 0

Replies

replied on July 28, 2021

You can look for new lines in your pattern too. A new line is made up of 2 objects though, carriage return and new line. The carriage return is a movement of the cursor back to the beginning and then a new line is moving it down to the next line.

\r\n together will search for a new line where \r is the return and \n is the new line

Amt\r\n(.*) for example will pick up everything it finds after the words "Amt" and a line return.

1 0
replied on July 28, 2021

An expression like this should get the decimal value for ether pattern

Check[\s\r\n]*Amt[\s\r\n]*\$(\d+\.\d\d)

 

1 0
replied on July 28, 2021

this is great but I fail to mention that there is more data than that on the three lines. Ex.

TOTALS:         ALLOWED        DEDUCT         TOTAL         PROV PD         CHECK
                           AMT                    AMT            AMT               AMT                AMT
                          $100.00             $5.98            0.00               $90.00            $90.00

and of course I only want to retrieve the $90.00 from the CHECK AMT

0 0
replied on July 28, 2021

It all depends on which pattern is consistent and most reliable. For example, I might just look for the 5th dollar amount found here.

Knowing that you have access to look for line returns when needed allow you to come up with any sort of pattern.

What you can't do is say look here and then down 30 pixels. Text data is not in pixels. That is only for OCR systems.

0 0
replied on July 28, 2021 Show version history

Ok, the actual format is this and all our PDFs are same format/layout. But on different pages. Some PDFs have one page some PDFs have two pages some PDFs have three pages. so the pages may vary. 
 

TOTALS:        # OF        ALLOWED        DEDUCT         TOTAL         PROV PD         CHECK
                      CLAIMS    AMT                    AMT            AMT               AMT                AMT
                        11           $100.00             $5.98            0.00               $90.00            $90.00

How would i configure this to get my projected dollar amount of 90.00

0 0
replied on July 28, 2021

PDFs are generally image files, so it sounds like your working with a text layer that is included with the PDF. Take the 5th dollar amount you find.

0 0
replied on July 29, 2021

The way I would form the expression,

  1. Find "TOTALS"
  2. Find any value other than new line 1 or more times "[^\r\n]+"
  3. Find new line "\r\n"
  4. Find any value other than new line 1 or more times "[^\r\n]+"
  5. Find the last instance in line of "AMT"
  6. Find any value other than new line 0 or more times "[^\r\n]*"
  7. Find new line "\r\n"
  8. Find any value other than new line 1 or more times "[^\r\n]+"
  9. Find the last instance in line of dollar sign "\$"
  10. Select 1 or more numeric digits followed by period or comma followed by 2 more numeric digits "(\d+[\.,]\d\d)"

 

Final regular expression:

TOTALS[^\r\n]+\r\n[^\r\n]+AMT[^\r\n]*\r\n[^\r\n]+\$(\d+[\.,]\d\d)

 

0 0
replied on July 29, 2021

Getting closer, I kinda did a small mistake. There is no $ sign, sorry. So I removed the \$. And it did pick up this up 0.00 instead of 90.00.

Much appreciate you alls hard work.

0 0
replied on July 29, 2021 Show version history

Try this

TOTALS[^\r\n]+\r\n[^\r\n]+AMT[^\r\n]*\r\n[^\r\n]*\s(\d+[\.,]\d\d)

 

1 0
replied on July 29, 2021

Yessss, this works!!!! Thanks

0 0
You are not allowed to follow up in this post.

Sign in to reply to this post.