You are viewing limited content. For full access, please sign in.

Question

Question

Workflow Find/Replace section of a Pattern Match

asked on January 28, 2015

The OCR process does not correctly read one of our reference number values and they come in as digits as opposed to letters. How can we do an adjustment to the pattern matched values to fix incorrect strings.

Example:

Original Item: B5-123456-OI

OCR'd as either: B5-123456-0I, B5-123456-01, or B5-123456-O1

I'd like to include an adjustment that will rename 0I, 01, or O1 to OI in the workflow before processing the values.

I can see the QuickFields program has a substitution function but this does not seem to be available in Workflow.  Is there some other function that could be used on the value to adjust?

0 0

Answer

SELECTED ANSWER
replied on February 2, 2015

Hi Mark,

You can try to create a new token using the token editor on the original item. The pattern matching in the token editor would contain all the possibilities : 01, O1 etc. :

%(OriginalItem#<^(B5-\d{6}-)01$|^(B5-\d{6}-)O1$>#)

The () will match the beginning of %(OriginalItem) excluding the ending 01 or O1 and followed by OI. :

%(OriginalItem#<^(B5-\d{6}-)01$|^(B5-\d{6}-)O1$>#)OI

Of course, a conditional should be used before to exclude AI, AE etc.

Regards,

 

Regards,

ReplacedByOI.PNG
ReplacedByOI.PNG (46.41 KB)
1 0
replied on February 5, 2015

Excellent, Thanks everyone!

We used this with a few slight adjustments. Works like a charm so far.

%(PatternMatching_Reference#<(B\d{1,2}-\d{6}-)>#)OI

Then build some Routing tables for other possible areas that could error.

0 0

Replies

replied on February 2, 2015

Hi Mark,

I'd also recommend checking that your optimization priority for OCR and Snapshot is set to Accuracy. To check that setting for the OCR, in the Client go to Tools -> Options -> Generate Text -> General; set the Optimization Priority to Accuracy. For Snapshot, hit Print as usual and in the popup window, go to the Text tab and set the Optimization Mode to Accuracy. If you find that these fixes don't help enough or you'd like more help with the pattern matching, please let us know!

1 0
replied on February 5, 2015

This was one of the first items we tried but sadly didn't work for this specific job.

0 0
replied on January 28, 2015

Is the "OI" always "OI" or do the letter combinations change?

0 0
replied on January 28, 2015

Hi Mark,

I have a couple of additional questions for you. What type of documents are these (pdf, .doc, .tif, etc.)? Also, what is the ultimate goal of the text within the text pane? Are you making sure this text is accurate so the value can be used in metadata? If not, can you provide more detail about how you plan to use the text within the text pane?

0 0
replied on June 9, 2017

Tanya,

I see this post was from a couple of years ago, but I am encountering the same thing with our VIN numbers.  By national naming standards VINs will not contain I's or O's, but only 1's and 0's (zeros).  So after my documents go through the OCR Process and then the Workflow does a pattern matching on the VIN to pull from the Lien or Title sometimes the VIN will return with I's or O's.  I am at a loss on the expression to use on my pattern matching token because you never know where it will show in the VIN and there could be multiple I's or O's.  The VIN will be anywhere from 6 to 17 characters in length.

Any help is truly appreciated!!

Robin H

Oklahoma Tax Commission

VIN Pattern Match replace I with 1.JPG
0 0
replied on June 9, 2017

Pattern Matching does not replace values, it only does data validation and extraction. You can twist it into a semblance of it as indicated in the answer of this post, but that gets unwieldy fast when you deal with more unstructured data or longer strings like in your case.

Use the Token Calculator activity and its Substitute function instead.

0 0
replied on August 2, 2018

@████████

Did you find a solution to your VIN issue? I am doing a similar thing with VIN's here and getting I's and O's instead of 1's and 0's.

VIN's here are always 17 characters.

0 0
replied on November 20, 2018

Jonathan Roberts,  sorry I didn't get back to you. We ended up writing a script through workflow that would do a substitution on the I's and O's to 1's and 0's.

 

Robin

0 0
replied on January 30, 2015

The letter combinations could change but OI is the common one that gets mistaken that's why I want to do a find/replace on the mistaken values. Even have the pattern match only match letters and not digits.

The possible combinations are: AI, AE, AD, OI, OE, OD, TI, TE, TD

The documents are printed into Laserfiche using Snapshot from a PDF, and we then use these values in the Metadata to find the corresponding documents and merge them into one file though a workflow.

0 0
You are not allowed to follow up in this post.

Sign in to reply to this post.