You are viewing limited content. For full access, please sign in.

Question

Question

Regular Expression, two lines in retrieve document text

asked on November 18, 2014

I am trying to pattern match from the full document text in workflow to set a field value. The way the OCR in snapshot has worked on these documents (invoices) "amount due" and the actual amount or on separate lines. I have tried \r \n \r\n etc and cannot get it to match the whole thing.

 

Here is some of the document text

1000038010 
Amount Due Amount Paid: 
16.78 

I want to return just the 16.78 part, I have done lots of regex in laserfiche but never on a new line.

 

Any help would be great.

 

P.S. If i paste the text into notepad++ it shows CR LF at the end of each line, so I would have throught \r\n would do it.

0 0

Answer

SELECTED ANSWER
replied on November 18, 2014 Show version history

I just made a simple example based on your document text to test on my end, and I was able to match with "\r\n(\d+\.\d\d)\r\n" without the quotes. This should match anything on its own line, with a numeric format that is fixed to 2 places after the decimal. If you have more than one matching number in your document text, it will only grab the first one. You could also leave out the second "\r\n" in the expression if the number could possibly be at the end of the document. If it is guaranteed to be at the end of the document text, then it's more reliable to match with "(\d+\.\d\d)$" to grab the number just before the end. Is there a specific error you're getting?

0 0

Replies

replied on November 18, 2014

thanks

i just found an unrelated answer where they used \r*\n*\s* and that worked for me. however i think your answer would work as well for me.

0 0
replied on November 18, 2014

Yea, the OCR can throw in some unpredictable junk/spacing sometimes. What you found there is definitely more flexible than my answer. Glad you found a solution.

0 0
You are not allowed to follow up in this post.

Sign in to reply to this post.