You are viewing limited content. For full access, please sign in.

Question

Question

Feature Request: Add RegEx qualifiers to the data zone in addition to the anchor zone

asked on May 2 Show version history

A big problem in all data capture projects is the form identification process.  With Capture Profiles, we are finding that certain profiles tend to be aggressive, incorrectly identifying documents in unpredictable ways. 

The request is, add a regex qualifier to the data captured in addition to the anchor zone, so that this process can be more precise.  It is like saying, "if ABC is here and XYZ is there, then this form is one type. If these fields are reversed, the form is of a different type."

In the end, if you can't land your zones, you'll not get the results you want.  This is relevant for any Capture Group with multiple profiles.

0 0

Answer

SELECTED ANSWER
replied on May 2

Your feature request has been noted and it ties into some related feature requests about zone validation.

Regarding

With Capture Profiles, we are finding that certain profiles tend to be aggressive, incorrectly identifying documents in unpredictable ways. 

Yes, we are aware of this issue--it was an early design decision (to aggressively try to find a match) that hasn't played out the way we expected.  The current behavior is that if any zone anchor  matches (or any barcode zone is found), anywhere on the page, that profile is considered a candidate.  Once the candidates have been determined, the Capture Profile Group will then decide between the candidates based on the number of zone anchors matched and the number of barcodes found (weighted by the distance between the original location of the zone and the found location of the zone).  One implication of this design is that if you have a zone that is anchored to a generic word (e.g., Total, Name, Address), it's going to match a lot of different documents, even when you wouldn't expect it to.  Another implication is that the zones that don't match are irrelevant to the determination.  For example, if a capture profile has twenty anchored zones and only one matches for a given document, that profile is still considered a candidate (there is no minimum threshold).  And if no other profiles match at all, that capture profile will be used.  

For now, the workaround is to make sure ALL of your anchors are as specific as possible.  Anchors don't have to be next to the zone they are capturing, they don't have to be a single word, and they can use regular expressions--so make the anchors very specific for any capture profile being used in a Capture Profile Group.

We have some plans to change this behavior, but the timeline for such changes has not been determined. 

1 0

Replies

replied on May 2

Thanks Jacob - 

It also sounds like it would be a good idea to add zones even if we don't need the data, so long as they can be relatively unique.  We are finding that company names are the best anchors, as they tend to be distinct. 

Further, some doc types seem more prone to mis-identification than others. We found it effective to create different groups, and put the vulnerable doc types first. Then we cascade through different groups one by one.  Here is the current layout:

 

Tough Nuts (to crack) goes first, followed by Minor then Major:

 

 

These zones are tight so it give us what we want, but also more than what we want:

 

We just want the PO, but the extra lines are OK,  because we can just parse this into a multivalue field and search with asterisks.  All in all, it's working better than I expected for a tough application.

 

 

1 0
replied on May 2

Follow up: Adding extra zones, especially bar codes, helps with the ID process, along with re-arranging the order in which they are processed.  Add the RegEx on data values and you'll have something with extremely impressive capability.

1 0
You are not allowed to follow up in this post.

Sign in to reply to this post.