You are viewing limited content. For full access, please sign in.

Question

Question

Quickfields/Workflow - Search Immediatly Above or Below found pattern for another pattern

asked on August 22, 2014

I don't think this is possible but since I am continuously asked I thought I would verify with the community.

 

They would like to search all text on the page for a pattern then search immediately above or below, within a specified pixel range, for another pattern.

 

I do know that LF has tracked the location of all OCR'd text on the page, but I don't know how to access this data.

 

IE: If I searched for a pattern "P.O. NUMBER" and specified a search range, below: x400y92 pattern: (\d+)

 

I would get 9876 from this invoice

 

 

 

0 0

Replies

replied on August 22, 2014

I'm curious why you require that function. Looking at your description I might Zone OCR a reasonably large box around the general area and pattern match to (\d+)\d\d/\d\d/\d\d so you identify the whole number ahead of the ship date. Zone OCR is a way of specifying that location you are wanting, right?

 

Or something like that. Just an idea.

0 0
replied on August 22, 2014

No ZoneOCR allowed, they want it to work without a specific form. For example, invoices from any possible vendor. It is something offered by a product called abbyy I guess.

0 0
replied on August 22, 2014

you would need to do an extreme amount of pattern matching and definitely use decolumnization for the OCR engine parameters. 

0 0
replied on August 22, 2014

can you not use ZoneOCR with decolumnization to get this?

0 0
replied on August 22, 2014

There is no way to get data like that from full page OCR.

0 0
replied on August 25, 2014

Given «P.O. Number» appears on the following line in your example, you can Retrieve Document Text then Pattern Matching the text as follows:

 

...then find a line in %(Line_) with the text "P.O. Number" whose next line pattern matches \d+.

 

To add more context hence prevent false positive, include the date to the right so the pattern may end up something like [\s|\t]+(\d+)[\s|\t]+\d\d\\\d\d\\\d\d[\s|\t]*

0 0
replied on August 25, 2014

The problem with this is that we have no idea what we are going to get on the next line, one vendor may not put the date in the same location. I think the only way to do this is to use a defined zone.

0 0
You are not allowed to follow up in this post.

Sign in to reply to this post.