You are viewing limited content. For full access, please sign in.

Question

Question

Pattern Matching

asked on March 30, 2017

Hi,

 

I have a large index field on a record which is collected from our scanning software and imported into Laserfiche.

 

I am looking at the Pattern Matching WF option to try and pull certain bits of information from this OCR'd field.  An example of a field is:

v va^•__________________________________^WORK ORDER: MM1462002638^• • Instrument washer: 1462^Start cycle date: 20/11/2015 - Start cycle time: 17:03:34^End cycle date: 20/11/2015 - End cycle time: 17:49:32^Machine Cycle: INSTRUMENTS^Parcel code: 2638 itaoki^Operator: LOAD MANUAL Signature:

 

I am trying to create a token with the WORK ORDER number MM1462002638.  From this New Token I then need to split that down to:

 

New Token 1 = 1462 (which you can see is after MM)

New Token 2 = 2638 (which is the last 4 digits of the number)

 

I then need to pull the date after Start cycle date as:

New Token 3 = 20/11/2015

 

Any help, or different ways to achieve this would be very much appreciated.

 

Thanks,

Anthony

 

0 0

Replies

replied on March 30, 2017

Hi Anthony,

You could use one single Pattern Match activity to generate all of these tokens, in order:

  • WorkOrder
    • Input: %(OCRedValue)
    • Pattern: WORK\W*ORDER:\W*(MM\d+)
      • Look for "WORK ORDER:" and match the string beginning with "MM" followed by at least one digit character 0-9.
    • Result: MM1462002638
  • NewToken1
    • Input: %(WorkOrder)
    • Pattern: MM(\d{4})
      • Look for "MM" and match the first 4 digit characters following.
    • Result: 1462
  • NewToken2
    • Input: %(WorkOrder)
    • Pattern: (\d{4})$
      • Match the 4 digit characters at the end of the string.
    • Result: 2638
  • StartCycleDate
    • Input: %(OCRedValue)
    • Pattern: Start\W*cycle\W*date:\W*(\d{1,2}\/\d{1,2}\/\d{4})
      • Look for "Start cycle date:" and match the string of the form dd/MM/yyyy, dd/M/yyyy, d/MM/yyyy, or d/M/yyyy.
    • Result: 20/11/2015

The "\W*" could be replaced by simpler space characters; this was to be safe in case the OCR activity assigned e.g. multiple whitespace characters or none at all. For more particulars on the regular expression syntax you can consult this W3Schools resource.

Hope this helps!

3 0
replied on March 31, 2017

Thank you!  I have been playing with the pattern matching and it is working very nicely indeed.

I do have one issue on another pattern matching scenario.  I am trying to pull a name from a Name: field.

 

Sometimes the field is Name: or Name. or just Name

I don't want to pick up the : or . after the name, I just want the actual file name.

 

Also, how do I deal with an hyphenated name?

So for example:

Name. John Smith-Wilson

Would return the name:

John Smith-Wilson

AND

Name: John Smith

Would return the name:

John Smith

 

0 0
replied on March 31, 2017

If there is nothing else in the line after the name, then a pattern like this should work:

Name[/.:]?\s+?(\w.*)\n?

0 0
replied on April 4, 2017 Show version history

Thank you for you help with this, just one more question.

 

With regard to the first pattern patching:

 

Sometimes the Work Order Number OCR's in the following format:

M M 123456789

Or

M M123456789

Or

MM 123456789

Or

MM123456789

How can I catch each format, with all the white spaces, etc?

 

Thanks,

0 0
replied on April 4, 2017

You can use this:

(M)\W*(M)\W*(\d+)

We can use the \W* trick from before, where \W matches all non-word characters; i.e. all characters not 0-9, a-z, A-Z or _. Alternatively you could use \s to match whitespace only (tabs, spaces, newlines). The asterisk matches 0 or more of this type of character.

This pattern will return three matches: the first "M", the second "M", and the string of digit characters. In Workflow, you can set the token to combine multiple matches with no spaces. See attached screenshot.

There might be a way using regex only to return one match, but if such a pattern exists I think it would be unnecessarily complicated given the tools already provided in Workflow pattern matching.

Hope this helps!

multimatch_regex.png
0 0
replied on April 27, 2017

Thanks for the help on this.  It's working as expected, just can't quite get my head around the syntax used...

 

Anthony

0 0
You are not allowed to follow up in this post.

Sign in to reply to this post.