You are viewing limited content. For full access, please sign in.

Question

Question

Pattern Match works in testing, and in Output collection, but fields are blank.

asked on April 5, 2023

Could I get help with this issue again?  I have noted below two other LF Answers posts I've had in years past with this issue, however, the discussions there haven't helped me clear this one up yet.  I've tried all the suggested changes and still, I can only get so far.  The Output is definitely collecting the right Owner's Name, however, the field is blank.  The field has been set with the Pattern Matching token for Owner's Name and that token tests out fine, but the field does not get filled. 

https://answers.laserfiche.com/questions/80302/Quick-Fields-pattern-match-works-in-testing-but-not-when-session-is-run#80610

https://answers.laserfiche.com/questions/79981/Quick-Fields-Pattern-Matching-works-but-value-in-field-does-not-match#80041

0 0

Answer

SELECTED ANSWER
replied on April 6, 2023

Is the OCR actually putting spaces in the word OWNERS:, or does it just look that way visually?  If it is actually putting spaces, so that it is O W N E R S: instead of OWNERS:, that could be part of the issue, since the RegEx you are using is specifically looking for the word OWNERS before the name, but won't match to O W N E R S:.

If that is the case, something as simple as accounting for possible spaces within the word OWNERS: may be sufficient.  Here's the same RegEx string I posted before modified to allow for up to 3 spaces between each letter of the word OWNERS: - testing again on RegEx101.com, it seems to be working. 

O\s{0,3}W\s{0,3}N\s{0,3}E\s{0,3}R\s{0,3}S\s{0,3}:\s*[\r\n]*(.*)[\r\n]*

 

It's still looking for the exact word OWNERS:  just with possible spaces in-between.  If the OCR is misreading it, for exampleing using a zero instead of an o so that it is 0WNERS: it won't catch that.  The whole thing has been written on the assumption that the name comes after the word OWNERS: so if there is any errors in identifying the word OWNERS: it won't succeed at finding the name.

 

Also, just to note.  I don't have experience with QuickFields - I've been coming at this as a purely RegEx question.  If the RegEx is actually working, and it's just QuickFields behaving weird, then I apologize for steering you down the wrong path.

1 0

Replies

replied on April 5, 2023

The fact that it is matching in your testing, but not in the live processing, makes me wonder if there is a difference in line breaks between your testing and the live form.

In Windows, you're going to see line breaks as \r\n but within data from a database or other process, you may only see \r or \n, so I think it might be better to tell it to check for \r or \n or both.  If we say [\r\n]* it will match 0 to infinite occurrences of \r or \n, so it could also match \r\n or \n\r or \r\r\r\r\n, etc.

I just tested this: 

OWNERS:\s*[\r\n]*(.*)[\r\n]*

in both Workflow and at regex101.com and it seemed to work for me.

Note that the match group you had in your screenshot   (.+?)   is probably just going to grab the first letter in the name, whereas using   (.*)   will end up grabbing everything up until it encounters the following values (which is a line break).

Here's the explanation from regex101.com regarding what each part is doing:

  • OWNERS: matches the characters OWNERS: literally (case sensitive)
  • \s matches any whitespace character (equivalent to [\r\n\t\f\v ])
    • * matches the previous token between zero and unlimited times, as many times as possible, giving back as needed (greedy)
  • Match a single character present in the list below [\r\n]
    • * matches the previous token between zero and unlimited times, as many times as possible, giving back as needed (greedy)
    • \r matches a carriage return (ASCII 13)
    • \n matches a line-feed (newline) character (ASCII 10)
  • 1st Capturing Group (.*)
    • . matches any character (except for line terminators)
    • * matches the previous token between zero and unlimited times, as many times as possible, giving back as needed (greedy)
  • Match a single character present in the list below [\r\n]
    • * matches the previous token between zero and unlimited times, as many times as possible, giving back as needed (greedy)
    • \r matches a carriage return (ASCII 13)
    • \n matches a line-feed (newline) character (ASCII 10)
0 0
replied on April 5, 2023 Show version history

Matthew, I'm looking more closely at your notes now and can tell you that I did try \n at first and while looking to get better results, I did also try the \r\n because the "Show line break characters as \r and \n" was showing that the Result Value was the owners name plus \r.  So I left the \r\n in my formula.  Just now tried OWNERS:\s*\r?(.+?)\r but it is still the same result... no owners in the fields, but in testing it looks great.

I was using the (.+?) because it was successfully grabbing only the person's name and not the mailing address that is on the lines below.  Just now I tried changing the (.+?) to the (.*) and not changing anything else and the result was a collection of the name and everything below that line.

I did remove the "Match Case" checkmark.

Regarding whitespaces... my scanned pages do have an issue with spreading out letters, however, I was collecting the Output collection and using it for the "Test the pattern" steps hoping that it was a true test using only what text was actually captured by the Zone OCR.

 

0 0
SELECTED ANSWER
replied on April 6, 2023

Is the OCR actually putting spaces in the word OWNERS:, or does it just look that way visually?  If it is actually putting spaces, so that it is O W N E R S: instead of OWNERS:, that could be part of the issue, since the RegEx you are using is specifically looking for the word OWNERS before the name, but won't match to O W N E R S:.

If that is the case, something as simple as accounting for possible spaces within the word OWNERS: may be sufficient.  Here's the same RegEx string I posted before modified to allow for up to 3 spaces between each letter of the word OWNERS: - testing again on RegEx101.com, it seems to be working. 

O\s{0,3}W\s{0,3}N\s{0,3}E\s{0,3}R\s{0,3}S\s{0,3}:\s*[\r\n]*(.*)[\r\n]*

 

It's still looking for the exact word OWNERS:  just with possible spaces in-between.  If the OCR is misreading it, for exampleing using a zero instead of an o so that it is 0WNERS: it won't catch that.  The whole thing has been written on the assumption that the name comes after the word OWNERS: so if there is any errors in identifying the word OWNERS: it won't succeed at finding the name.

 

Also, just to note.  I don't have experience with QuickFields - I've been coming at this as a purely RegEx question.  If the RegEx is actually working, and it's just QuickFields behaving weird, then I apologize for steering you down the wrong path.

1 0
replied on April 6, 2023 Show version history

Much appreciated, Matthew!  Every bit of information is helpful!

In answer to your question about what the OCR is pulling... I believe it is pulling the OWNERS: okay as I was copying the Quick Fields output results to test with and it looked okay. I did want to try a RegEx that would account for if blank spaces were found, just in case that was it, so thank you for that sample! I just tried your RegEx and I did get a different result on one of the two entries that QF was acting on. The first entry, it still did not add the owner's name into the field even though I can see in the output results that the Pattern Matching activity DID successfully pull the correct name. On the second entry in the test run, it DID find the property owner's name AND DID successfully place that result in the Owners name field, but also picked up the next line which included the guy's box number:

0 0
replied on April 6, 2023

That's weird - it's like it ignored the line break after the name.  But .* shouldn't include line breaks.  I wonder if QF handles that part of RegEx differently...

Where you said that (.+?) was working in your testing instead of (.*) - why don't you try using (.+?) again in the example where it grabbed the box number after the name?

0 0
replied on April 6, 2023

Okay, I can report that I now have all the fields filling at least some of the time.  For some reason, the owner's names do not always fill, but if it is filling sometimes, then at least I know the configuration is right.  Something else is stopping it some times.

In the last test, there were about ten documents separated into individual entries and handled.  One did not fill in the owner name because there were 3's accidentally filling spaces all over the document.  Next one I couldn't see a reason why.  Next one I couldn't see why and the next one was the exact same as the previous one (exact same name and look on the page!) and yet it DID fill in!

0 0
replied on April 5, 2023

Are these processes set to run on all pages or just page 1?

0 0
replied on April 5, 2023 Show version history

Matthew - Yes, on the pages where OWNER is typed, there is sometimes a blank line between the word OWNER and the actual line where the owner's name is.  But in testing, it worked even if the blank line is there so I was hoping that wasn't it and I wasn't savvy enough to figure out how to change the RegEx to get it to work even if there was a blank line.  Thanks for all the info!

Miruna - I have tried it both ways... telling it to run on all pages (because sometimes the owner name is on the second page... although not in the two tests I was using) and then changing it to run on just the 1st page hoping for better results, but... no.

0 0
replied on April 5, 2023

Double checking the Zone OCR.  Of note is:  This zone is used to collect two things (1) owner's name and (2) subject info for a Subject field in the template.  The Subject field is getting what is needed as per the Pattern Matching config for that one, yet the owner name isn't getting filled.

0 0
You are not allowed to follow up in this post.

Sign in to reply to this post.