You are viewing limited content. For full access, please sign in.

Question

Question

Regular Expressions - Something else?

asked on November 14, 2016

I have a QF session I'm working on where the information is on 2 lines.  I was going to do zones; however, the image can shift causing the margins to be significantly different.  The invoice is just not "image friendly" at this time!

 

 

Date has a pattern to it, but I'm also looking to try and grab Invoice Number and Customer P.O. Number - should there be a value.  They will always be directly below in the specified box.  Is there a way to do a top down pattern or is this something else?

 

Thanks for the help - much appreciated!!

0 0

Replies

replied on November 14, 2016 Show version history

Are you saying that the entire table shifts around, or that the formatting of the table itself changes significantly between documents? If it's just the table as a whole shifting around, then using a Form Alignment process might help.

If that doesn't work or if the column widths are highly variable then it might be a bit tricky, because unfortunately there isn't a way to directly tell Quick Fields to dynamically place a zone underneath a specific word/pattern. 

Splitting the table data 'vertically' can be done relatively easily by setting the Zone OCR advanced settings to Single line: False and Create multi-value token: True. However, when it comes to splitting the data 'horizontally', this will have to be achieved primarily with regular expressions. 

You can create OCR Zones wide enough that they will likely contain what you need despite the exact column width, then use pattern matching to filter out any extra text captured from columns to the left and right. Even if the pattern is not as specific and predictable as the date, there's likely some sort of generic pattern you can expect the Invoice Number and Customer P.O. number to match. 

0 0
replied on November 15, 2016

I've attached a picture for you.  The margin shifts, causing the table to shift.  I tried cutting off the white border, but didn't get the results I wanted or expected.

I'm getting better every time with regular expressions, but this one will be a challenge.  The row it's on is all numbers.  It is after the date pattern so that could help.  It is not necessary a set number of digits.

Thanks for the help!

0 0
replied on November 15, 2016 Show version history

Unfortunately, the pasted sections in your attached .docx file aren't useful to experiment with - you'd need to upload the full pages in their original format (with any sensitive information redacted). 

Have you tried playing with Form Alignment? What are the results?

What does the text look like when you OCR the entire table?

0 0
replied on November 16, 2016

Is Form Alignment only available with complete?  I don't see the option...  They do not have complete so it may not be an option.

I've attached actual samples - redacted.

Thanks!

Reg Margin.pdf (91.83 KB)
0 0
replied on November 17, 2016 Show version history

Hi, 

I was able to capture information from the table reliably using Zone OCR along with a couple of conditional processes to make the 'shifted' image line up with the others (this solution does not use Form Alignment). Note that this solution assumes that if the image is 'shifted', then it's always shifted in pretty much the same way. 

 

I started by turning on the following option in the scan source settings to scale to 300x300 DPI for simplicity and consistency because I originally ended up with mismatched DPIs after generating pages from your PDFs. 

 

Then, I decided on a way to uniquely identify when the image is shifted. This can likely be done with pattern matching and/or Zone OCR. 

Then I used this information to set up a Conditional process that only runs in the case were we have a skewed margin. Then, under the conditional, I added a crop and a resize process:

This takes a bit of trial and error - I ended up with settings like this:

 

After I set that up to get everything to align consistently, I was able to set up OCR Zones for the table that captured data consistently for all the samples you provided. 

0 0
replied on November 20, 2016

Excellent - let me give this a try.

I did check what they have for QF - they don't have the forms alignment option so something like this would have to work :-)

0 0
You are not allowed to follow up in this post.

Sign in to reply to this post.