You are viewing limited content. For full access, please sign in.

Question

Question

Pattern matching for only 1 line of text

asked on March 1, 2017

Hello Everyone,

I need a little help capturing one line of text in a large Zone OCR box that picks up multiple lines of text. 

 

This example is what I was working with. I need to capture everything after TO THE but nothing on the next line. I used \r \n \s with no luck. Thanks for the help. 

0 0

Answer

SELECTED ANSWER
replied on March 2, 2017

Try

TO THE\s*([^\r\n]+)\r\n\w+

1 0

Replies

replied on March 1, 2017

Try \n\n. Zone OCR ends lines with 2 newline characters.

0 0
replied on March 1, 2017

I gave it a try and if failed. 

Vendor Name is the OCR box

Ven_PM is the pattern

whats really interesting is that in my environment, I can get the pattern to work. 

This is theirs;

 

Mine is 10.0.0.976.

I'm going to update them to the latest QF and see if that helps. 

0 0
replied on March 1, 2017

Just upgraded to the latest, 10.1.0.168

still getting this;

I added the \w at the end to see if that would help and it did not. The \n\n did not work at all. No match. I don't want to downgrade, what now? 

0 0
SELECTED ANSWER
replied on March 2, 2017

Try

TO THE\s*([^\r\n]+)\r\n\w+

1 0
replied on March 2, 2017

If you make a dummy pattern such as .* and check the box that says "Show line break characters as \r and \n" then you will be able to see whether those characters are present in your test value. This can help a lot in situations like this when you're not sure what linebreak characters you need to account for in your pattern. 

You'll also want to make sure that the data in your test value area is copied and pasted from the document's OCR or Zone OCR output to ensure that what you're testing with has the same linebreak characters you'll get when you run it on the actual document. 

1 0
replied on March 2, 2017

I got it to work since I realized the next line text will always be ORDER. Built the pattern to match that and then \n worked. Not sure why it wouldn't work relying purely on the \n or \r.  

TO THE\s*([\w\s]+)\nORDER

Thanks for the help everyone. 

 

0 0
replied on March 7, 2017

Hello again, 

My pattern stopped working because OCR for some reason decides to not read ORDER correctly and drops the OR. Since my pattern was looking for an O, it fails. I tried Alex's pattern and that one worked. 

Alex, is there a good site you use to test patterns. I have been using regex101.com and changing it to single line and I could not get it to work exactly like QF acts. I copied the text from OCR and used your pattern and no match, but I run the session and it works. Any suggestions would be great. Thank you. 

0 0
You are not allowed to follow up in this post.

Sign in to reply to this post.