You are viewing limited content. For full access, please sign in.

Question

Question

Questions with Topic Quick Fields

asked on November 12, 2013

 

I have a multiline ZOCR in QF that looks like
Information    1    Zone 2 : Work Order#
666

 

There is a CR/LF x0Dx0A after the #

 

I am trying to get a reg exp to pick out the 666 but can not get it to get beyond the CR/LF  The expression is Order#\s*(d+).  As I type this in the test field of QF the result value is seen as I type the exp...Order#\s* all good to this point, but when I type 6 it will not display.  

If I do a ^.+ all characters are displayed as they should be even beyond the 666 as I have more so I don't think QF is holding me back because of some multiline setting.  Please advise

0 0

Answer

SELECTED ANSWER
replied on November 13, 2013

Kenneth

I have a working solution.  Drives me crazy I can use the test function but I've build a work around for that now.  I just tried \D*? and it workds fine also.  Thanks again.

1 0
replied on November 13, 2013

You may also want to have it do a few different pattern matches and use a conditional to have it use the best or only matches it has as the field value

0 0

Replies

replied on November 12, 2013 Show version history

If you have it set to a multi-value token, you may want to try only taking in line 2. 

 

If you are trying to get around the CR, then maybe you should try a \D?\D? (or \D?*) instead of the \s* 

^ That should hopefully work. Otherwise I'd refer to the button on the right of the regular expression field to find the special characters designation you are looking for. 

 

Also, i believe you need a \d+ in the matching section, not d+

 

EDIT: I tried my suggestion of the '\d+' instead of the 'd+' inside the matching and it seems to work for me. Of course the was inside of Pattern Matching in Workflow Designer.

0 0
replied on November 12, 2013

Regular expressions are very literal. So if you don't specify there is a newline character between the "Order#" and the number, Pattern Matching will reject anything with a newline character in between.

 

^.+ is basically saying "everything after the beginning of the input (white space or non-white space characters), so it is including the newline character and getting everything.

 

Try: Order#\s*\n(\d+)

0 0
replied on November 12, 2013

Your suggestion is good but you may want to limit that just slightly. We cannot guarantee in a Quick Fields session that the same thing happens in all cases. It would be best to include some question marks or {0,xx} in there to allow for the times when this is not the case.

0 0
replied on November 12, 2013

I thank you both for your quick response.  Unfortunately neither worked.  Logic says and another Reg Editor I used says Order#\s*(\d+) works, but as you can see here it does not.  I tried ther \D?\D? to no avail.  The problem may be that I am not using a token.  As you can see I am RE a field directly.  Is this a problem?  I will go back and create a token but this seems like an extra step.  Also, this does NOT work in WF either.  I tried to test it by putting a Routing Activity into a new WF and then opened the Token Dialog box for a Date field (looks just like the attached screens).

Bob

11-12-2013 2-36-24 PM.png
11-12-2013 2-38-50 PM.png
0 0
replied on November 12, 2013

I see two issues. 

 

In the first screenshot, you are having both the \n and \s* which is unnecessary. You want to at least add a '?' to the end of either or both of those and you may have better results.

 

The second screenshot seems to be that you have removed a space between 'Order' and '#' and that is a potential reason for the unaccepted pattern match. Try '\D*?' instead of the '\D?\D?' I suggest and I think you will also get better results.

0 0
replied on November 12, 2013

Like I said above, Order#\s*(\d+) can't possibly work because it's not accounting for newline characters.

 

I've used the text file you attached to test the pattern and it worked. You can try specifying that there could be more than one newline character in between the 2 lines: Order#\s*\n*(\d+). You could also try \r\n for the newline character (I don't expect that to work because OCR uses \n, but it's worth a try in case you're running into something weird).

1 0
replied on November 12, 2013

Kenneth, from the looks of your dialog box it appears you are using an 8.x version of QF.  Mine is 9.0.0.  That may make a difference

0 0
replied on November 12, 2013

I was using Workflow Designer actually, not Quick Fields to test this functionality.

 

You may want to make the Zone OCR into a multi-value field and used the 2nd index of the value to get the information you wanted.

0 0
replied on November 12, 2013

Kenneth,

I was given two solutions, so was trying both.  There is no space between Order & #.  It looks that way, but is it not there.  I am certain of all the characters as can be seen in this attachment.  Also attached is a WF token editor where \D*? does not work.  I have a rebooted machine and no Virus scanner on.  I can't use a multi index field as I have a number of these to fill in and I can not always be sure it will be the 2nd or 3rd row.  Thanks for any other suggestions you might have.

11-12-2013 3-14-46 PM.png
11-12-2013 3-17-08 PM.png
0 0
replied on November 12, 2013

Miruna,

I think it is something wierd.  I opened a new session also.  Attached is my latest screenshot per your suggestion and it does not work on my machine.  I have spend a number of hours on this trying many different things to no success including \r\n

11-12-2013 3-25-25 PM.png
0 0
replied on November 12, 2013

Oddly, I also could not get this to work when inside a regular expression editor in the token dialog box. I then tried the original proposed solution you had and got a hit

 

I used the exact text from the txt file you provided and you are indeed correct about some weird character being inside it but I think this may fall back on how different way you modify a token will act differently in Quick Fields or Workflow designer.

 

I recommend you try modifying the token by means of Pattern Match inside of Quick Fields in order to get the results you want. It should work fine then since the regular expression in the token dialog box does not seem to work.

0 0
replied on November 12, 2013

Can you try scanning a document rather than testing the output in the Token Dialog?

0 0
replied on November 12, 2013

Kenneth

Not sure what you mean by modifying inside QF.  That is what I have been doing.  Have tried this inside QF regardless of what token editor said and it did not work.  Please expand on your thoughts.  Thanks

0 0
replied on November 12, 2013

Kenneth

Got it working.  Will report the bug to support.  Comforting to know two of us get the same thing.  Thanks for your help

0 0
replied on November 13, 2013

What ended up working? I would like to know

0 0
replied on November 13, 2013

Kenneth

What would never work in any of the test facilities, but worked fine under running/scanning circumstances was Order#\s*\n(\d+) .  I did 6 other fields with variations of these.  \x0D\x0A would NOT work which baffles me.  I did not go back and try your suggestions \D?*, but will later today as it should work, but the more I read on the MS page, the more special \n is treated and this is the point Miruna was trying to make yesterday that \n has to be dealt with on its own.  The rest of the characters see to do fine apparantly.  Thanks again for your help

0 0
replied on November 13, 2013

I would try out adding question marks between those values outside the matching to try and increase the potential accuracy. 

 

Does this mean you are still without a proper solution at this point?

0 0
replied on November 13, 2013

\x0D\x0A is not a valid regular expression, so it is treated as a literal. The regular expression representation would be \r\n.

 

Newlines are usually \n or \n\n for text obtained from OCR in LF, either full text or Zone OCR.

0 0
replied on November 13, 2013

Miruna,

I don't understand. \x00 comes right out of the list of valid character escapes on both QF and WF.  What would be the proper way to use a character escape then in reg exp?

0 0
replied on November 13, 2013

I'm not seeing \x00, \x0D or \x0A in the list available from QF and WF. The character escapes listed there are \r for carriage return and \n for newline.

0 0
replied on November 13, 2013

Miruna

Open up the Token Window in either QF or WF, pick any vaiable like Date, check Regular Expression box and then click on the pattern button on the right and they character classes.  Also there is the help file

Character Escapes

Most regular expression language operators are unescaped single characters. The escape character \ (a single backslash) signals to the regular expression parser that the character following the backslash is not an operator.

Example: The parser treats an asterisk (*) as a repeating quantifier and a backslash followed by an asterisk (\*) as the Unicode character 002A.

Note: The character escapes listed are recognized both in regular expressions and in replacement patterns.

Regular Expression Description
  Characters other than . $ ^ { [ ( | ) * + ? \ match themselves.
\a Matches a bell (alarm) \u0007.
\b Matches a backspace \u0008 if in a [] character class. Otherwise, see the note following this table.
\t Matches a tab \u000B.
\v Matches a vertical tab \u000B.
\f Matches a form feed \u000C.
\n Matches a new line \u000A.
\e Matches an escape \u001B.
\040 Matches an ASCII character as octal (up to three digits); numbers with no leading zero are backreferences if they have only one digit or if they correspond to a capturing group number. For example, the character \040represents a space.
\x20 Matches an ASCII character using hexadecimal representation (exactly two digits).
\cC Matches an ASCII control character. For example, \cCis control-C.

So can you tell me why \x0D\0A does not work?  Tks 

0 0
replied on November 13, 2013

Oh, I see, sorry about that, not sure how I missed it. They're not being read as a group (like i mentioned, regex is very literal). They're read as "\x" and other characters. You need to wrap them in a non-capturing group, "(?:)", to get them to be considered together.

 

Order#(?:\x0a)*(\d+)

 

0 0
You are not allowed to follow up in this post.

Sign in to reply to this post.