You are viewing limited content. For full access, please sign in.

Discussion

Discussion

Attempting to use Smart Fields in order to extract table data from a PDF image

posted on July 24

I have multiple PDF files that are pictures of a table of data and I am looking to use Smart Fields to extract this data into a grouped template. The columns in the table are the following:

Product Description
Quantity
Code
List Price
SP Price

I created a template with the following descriptions

Smart Fields extracts all the data except the Code. It captures the data for every other column across all rows, but only includes the code half the time, leaving it empty for many rows. There should always be a code.

I am not sure how I can be more descriptive with what the Code field is, the column is clearly named Code.

The format of the table looks like this, but in crisp high resolution. It is not having any problem reading the data correctly, it is just missing the code for many of the products. I don't think it is capable of understanding the formatting here.

0 0
replied on August 4

The wrinkle with this format is essentially the placement of the text in the Code column.  You will notice it is vertically centered, so it lines up with one of the rows and you would need to indicate that the code should be use for all rows.  For this particular column, you just need to elaborate on what you want if no code is found.  Here was the description I used for the code column:

Code. If a code is not found, use the same code for all rows with the same description

 

Here are the results using the image you provided above:

1 0
replied on August 5 Show version history

Since this worked for you with my quick and dirty example image I was surprised because it does not work with the actual document I am using which is confidential. Yet the document I am using is of much higher quality and this image is made from it, but with the description code and prices replaced using paint.

I had also tried other similar prompts without much luck. The product description column is also vertically centered (I suppose it is on row 2 rather than row 3 but similar) yet it never makes a mistake.

I just tried using my example image with the template and it does work oddly enough. Only difference besides it being ultra low quality is that the real document has a whole lot more rows.

When I try your prompt exactly it is missing less codes, but the codes are all wrong being mixed in from other rows.

0 0
replied on August 6

Chad, we tested Shawn's prompt on a document similar to the original one you used (which, I'm guessing from the looks of your screenshot, was a version of the LF price sheet). Could you open a support case and attach your template definition and your sample doc so we can take a closer look?

0 0
replied on August 6 Show version history

I realized the one difference is that my mock up image is a png and the document was a PDF. Testing with native image formats like TIFF actually improved the results greatly! It just seems that using the PDF format was the issue.

0 0
replied on August 4 Show version history

Hi, Chad,

 

Try to be more specific with the description prompt. If the codes always follow the three/three format — ABC123 — you might try telling the field: "For each Product, extract its Product Code, which is three letters followed by three numerals."

 

I've found that the more specific you can be, the better your chances of Smart Fields picking up what you want it to.

0 0
replied on August 5

I don't think I can be specific for a pattern with the codes. They do not follow any pattern like this.

1 0
You are not allowed to follow up in this post.

Sign in to reply to this post.