You are viewing limited content. For full access, please sign in.

Question

Question

Pattern Matching Formula

asked on September 21, 2021

Hi,

Can someone help with this pattern matching. I have this Average Essential Score i'm trying to pickup but not sure how to make this work. I went to this site https://regex101.com/ and I'm have a little trouble understanding it. So I figured it would be easier just asking the professional.

Average Essential Score.jpg
0 0

Replies

replied on September 21, 2021

Are you using Quick Fields or just pulling text from the whole document and trying to Pattern match in Workflow? If QF, answer the next question.

Does the box ever move? If it's a digital document and the placement of the box never moves, then just zone OCR the score under the text 'Average Essential Score:'

If you are using workflow with the 'retrieve document text' activity, post the results of that activity. There are a couple of different ways the text can be generated and your pattern will depend on the results of generated text. 

0 0
replied on September 21, 2021

I got this far. I just can't get to the next line

 

Average Essential Score:[^\r\n]\s(\d\.\d)

0 0
replied on September 21, 2021

Hi Carlos

What I find in these situations, I use a couple Pattern Matches. The first like yours to strip out the CR/LF's [^\r\n]+ and choose the Combine results which gives a string with those items. Sometimes I even remove spaces and special characters to create tight text [^\r\n\s:;]+

The 2nd Pattern Match I create uses the first as the input, and then I can create a more simplified regex such as Essentials\s*\Score:\s*(\d\.\d) to get the digits

From the file in the repository, if you want to copy out the OCR text from the text pane and paste it here I can be more specific

0 0
replied on September 21, 2021

that is a PDF created from a MS Word and that box will always be there. So, that image I uploaded is exactly how it will always come in. And I'm using QF for this project.

0 0
replied on September 21, 2021

Sometimes the simplest solution is the best. If you using QF, just put your zone OCR box around the number you want. Done. 

As long as the placement of this value never moves on the document, you are done. Using QF on digital documents removes one of the hardest things to control with scanned documents. LOCATION of data. 

0 0
replied on September 21, 2021

That would be the easiest. However, that section can move one page down if the user add text to any of the page above that section. So that's the reason I have to use Pattern Matching because I will never on what page that section will land. 

0 0
replied on September 21, 2021

Ok, so when you test the zone OCR box you are using, what is the output? 

0 0
replied on September 21, 2021

Ok, so I put in an OmniPage Zone OCR. And I assigned it to page 5 in the Page Range section of the OmniPage Zone OCR. That works great, if and only if that box is on page 5.

So, I ran my test PDF and it gave me gibberish characters only because that box is now on page 6. 

 

 

0 0
replied on September 21, 2021

Now we are dealing with 2 issues. 

The first was a pattern to extract the data you want.

The second is getting the data when it's not on a consistent page.  

Maybe you need to bring in your VAR or SP, whatever they call them now. 

If you could provide the text output from a test run of the Zone OCR around the area you want to extract data, we could come up with a pattern. 

You are trying to process a 5 or 6 or who knows how many page document with QF  and I have no idea what you are using for First-page identification, last page identification, generate pages, extracting text layer from PDF, new doc class per document so the complexity has increased past what I can do here.

Good luck.  

0 0
replied on September 21, 2021

thanks for all your help. I will get them involved.

0 0
replied on September 21, 2021

As your Information moves page to page, you need to Use just OmniPage OCR and not Zonal. If you are using Document Classification, under Page Processing, if you add Pattern Matching, you can choose the page range as all pages.

Create a Token with the following regex
Essential\s*Score:\s*\r\n\s*(\d\.\d)

0 0
replied on September 21, 2021 Show version history

If you are not using Classification, the Pattern Matching in Pre-Processing does not give you the same All Pages Option but instead is limited to a Page or line  at which point you might want to look at using a workflow to capture this with Pattern Matching after the document has been uploaded to the repository

0 0
replied on September 21, 2021

A lot of good discussion here already with regards to Quick Fields, but I also wanted to chime in and mention that this sort of thing may be easier to accomplish in Capture Profiles using the "Anchored Zone" functionality that would allow you to configure it to find a particular text and then capture whatever is underneath it without necessarily having to use pattern matching. An example screenshot is below and you can read more here

Draw a box around the information you want to capture and anchor it to another bit of text on the page.

Your Solution Provider should be able to discuss Capture Profiles with you, if you're interested.  

0 0
replied on September 21, 2021 Show version history

When is a feature like this available for an on-prem installation? My company uses Rio and I don't see them switching to the cloud any time soon. 

2 0
replied on September 21, 2021

Our company also uses RIO too. We are on Version 11. Can it be used with Version 11?

0 0
replied on September 22, 2021

At this very moment it's only available for cloud, but our early access release for self-hosted is right around the corner and will be available for version 11 both Rio and Subscription. If you're interested in getting on the early access list, talk to your SP (or if you are an SP, reach out to us!)

0 0
replied on September 22, 2021

I definitely will ask them! THANKS

1 0
You are not allowed to follow up in this post.

Sign in to reply to this post.