You are viewing limited content. For full access, please sign in.

Question

Question

Bulk Importing

asked on January 19, 2016

We're working with a contractor to bulk scan thousands of files into TIF. They're including relevant metadata in a delimited file. What's the best way to import the documents into the Repository while applying the delimited data for each file?

0 0

Replies

replied on January 19, 2016 Show version history

Are they going to be using import agent ? I am guessing these are external files being scanned for first time, or it will be converting of existing documents inside the repository to tif images (using the generate pages) option ?

0 0
replied on January 19, 2016

That's what I was assuming. The files are all individual TIFs with the metadata in a single text file with one line per image.

0 0
replied on January 19, 2016

Can you provide an example of what would be contained text file (formatting included)?

0 0
replied on January 19, 2016

Yes, using ! as the delimiter, the following is an example from the metadata file. For the purpose of this example, I've replaced the actual data with a description of what it contains.

 

APPNumber!LastName!FirstName!!County!City!Year!Filename1.TIF

APPNumber!LastName!FirstName!!County!City!Year!Filename2.TIF

APPNumber!LastName!FirstName!!County!City!Year!Filename3.TIF

0 0
replied on January 25, 2016

So, any suggestions?

0 0
replied on January 25, 2016

Michael,

As a  PDP (Professional Developer Partner) I would write a small utility using the SDK to open the CSV, step through it a line at a time, import the appropriate tif image, then assign metadata and store to the repository.  Not sure how I would handle OCR, either OCR at import time (slower) or use the scheduled OCR activity in workflow once all docs were imported. 

If you used Import Agent to import the tif's and then workflow to open the CSV and populate the metadata you run into the issue were you could have multiple instances of the workflow opening the CSV at the same time (CSV's as data sources are not inherently thread safe).  If Import Agent/Workflow were my only options I would import the CSV into SQL as a table, then use workflow to search the SQL database for the newly imported image (using filename as a key) to grab the remaining metadata to populate the template.  The obvious downside to this approach is that if this is a production system it will most likely have a significant impact on system performance.

If you need further direction or assistance you are welcome to email me at cprimmer@qfiche.com.

 

0 0
You are not allowed to follow up in this post.

Sign in to reply to this post.