You are viewing limited content. For full access, please sign in.

Question

Question

New Line and regular expressions

asked on June 15, 2016 Show version history

if my output from the Zone OCR is:

SONOMA COUNTY OFFICE OF ED
LEGAL DEPARTMENT
5350 SKYLANE BLVD.
SANTA ROSA, CA 95403-8246

 

Why does the regular expression (.+) not just grab up to the first new line like it should?

The resulting output is the entire text group as if new lines don't exist in it.

 

The test shows a result I want.  Running the process shows the entire OCR as the result.  

0 0

Answer

APPROVED ANSWER
replied on June 24, 2016

When I use the linked RegexTester with the singleline option I get 

 

However, here's an alternate idea that you might find easier for grabbing a specific line - making a multi-value token with each line as one value, and then using the token editor to grab only the value you want.

This is actually pretty easy to configure in Quick Fields, it will look something like this:

 

0 0

Replies

replied on June 15, 2016

Hi, 

 

I think you've run into a similar problem as the one described in this post:

 https://answers.laserfiche.com/questions/98768/Pattern-match-test-differs-from-process-result

 

See if my answer to that one helps clarify what's going on.

0 0
replied on June 15, 2016

(which will be fixed in Quick Fields 10) that causes the regex to run in multi-line mode during testing even though it runs in single-line mode during runtime. In multi-line mode (during testing), "." does NOT match the \n character. However, in single-line mode (and thus at runtime), the "." character matches everything, INCLUDING the \n character. 

 

This makes sense from what I was noticing.  Essentially the strings from the multi-line result are moved into a single string on runtime.  So the \n is removed and the (.+) just grabs the entire new single string.

Can you validate if this has been fixed in the QF 10 version?

0 0
replied on June 15, 2016

It's been fixed in Quick Fields 10 in the sense that the pattern matching test area now uses single-line mode, so that what you see during testing matches what happens at runtime. Actual runtime behavior has not changed between versions 9 and 10 (since changing it would break some users' existing patterns).

0 0
replied on June 15, 2016

Bummer.  I really need the test for multi-line to match the runtime so all regular expressions from other programs work the same.

I can't write it differently because QF doesn't match every other regex open sourced.

0 0
replied on June 15, 2016 Show version history

I'm not sure what you mean. Quick Fields (and other LF products) use .NET regular expressions . There are many regular expression test tools available online if you choose not to use the built-in one. 

http://regexstorm.net/tester is one example, and it allows you to check a box to indicate single-line mode. 

 

You can find more information about single-line vs multi-line modes here https://msdn.microsoft.com/en-us/library/yd1hzczs(v=vs.110).aspx

1 0
replied on June 24, 2016 Show version history

Correct,  but in Quickfields when I'm grabbing multiple lines and running a RegEx for newline,  it's not valid.  Those websites will show me what I want and when I test it I will get the result of the entire Zone OCR because it's not seen at individual lines.

Ex:

I test on the website:

(.+)\n     for:

TEST1

TEST2

TEST3

I will get the result:  TEST1.

 

In Quickfield the same result is:

TEST1TEST2TEST3

 

So I can't use conventional RegEx website for that specific senario.

0 0
APPROVED ANSWER
replied on June 24, 2016

When I use the linked RegexTester with the singleline option I get 

 

However, here's an alternate idea that you might find easier for grabbing a specific line - making a multi-value token with each line as one value, and then using the token editor to grab only the value you want.

This is actually pretty easy to configure in Quick Fields, it will look something like this:

 

0 0
replied on June 24, 2016

If you refer to the first picture in the post, that is using Multi-line showing the result being the first line.  The output is a single-line result with all of it.  That's the issue at hand.  The multi-value doesn't help if when run acts as a single-line when grabbing the first line.

0 0
replied on March 22

I've been working through a similar scenario myself using QF 10.2. I expected (.+)\n to capture the first line of an address block but it returned the entire block instead. I tested using Regex Tester and the results were at least consistent, depending on the single/multi line settings and remembering to change the max number of matches value to 1.


I'm not entirely sure why but using (.+?)\n does the trick - only the first line is returned now so I'm sticking with that. I know adding ? makes the match optional but not sure why that makes a difference here. It works though so will leave it at that!

0 0
You are not allowed to follow up in this post.

Sign in to reply to this post.