You are viewing limited content. For full access, please sign in.

Question

Question

Importing PDFs with indexed information (metadata)

asked on August 22, 2017 Show version history

We are getting quotes from a couple of different vendors for back scanning and indexing and I'm having trouble wrapping my mind around how this works so I'm hoping someone withe experience can help....

Vendor 1: Software agnostic (i.e. not associated with Laserfiche). Their proposal has them scanning and indexing all the files, pulling our desired metadata from the existing cover sheet of each document. They will then provide us with what they called a "load file" in .lst format along with the imaged files. They do not import into Laserfiche, we are responsible for that. What I do not understand is how that attached metadata is actually extracted and populated into the template's metadata fields. Is this something we'd then have to write a workflow for? Because the vendor is software agnostic they just talked about this in general terms that didn't give me enough comfort that at the end of the day all our documents and metadata would be automatically there with little to no extra effort from us. Anyone had experience with this? 

Vendor 2: Laserfiche VAR. I feel very confident in their knowledge of Laserfiche import but their quote only includes scanning, OCR and one metadata field. The indexing of the rest of the desired metadata fields would be handled separately through workflow and more expensive. 

I would appreciate feedback from anyone who has undertaken a large back scanning project either with a VAR or another vendor. 

Thanks!

0 0

Replies

replied on August 22, 2017

We have not used a vendor for any large scanning project.  But reading your post, I would suspect the "load file" from vendor 1 would be either an XML file or CSV file or something like that.  If that is the case, yes you would need to build a workflow that  puts the metadata on the PDF file after it is imported.  You would probably want to look at have the "load file" pulled into a SQL database, then have the workflow connect to the database to pull the information.

As for using Laserfiche, you say the quote includes OCRing the document.  So to get fields beyond the one metadata field in the quote, you would need to set up a workflow that reads the text file associated with the PDF and create pattern matching to pull out the rest of the information.

1 0
replied on August 22, 2017

Going back and looking at my notes now the load file would be a .lst file. I'm not familiar with this file type but looking a little more on LF Answers it seems its supported through the import agent. Just wondering how easy it is? 

0 0
replied on August 22, 2017 Show version history

As long as you have Import Agent configured to recognize and import LST files, that part of things is relatively straightforward.

If you go to the install folder for Import Agent, it should have examples for both .lst and .xml files that Import Agent can bring in and use to populate metadata for a new entry.

For Example,

C:\Program Files\Laserfiche\Import Agent\List File Examples

 

In the simplest terms, a LST or XML file just works like a set of instructions for the Import Agent by detailing what file to import, what to name it, where to store it, and what metadata to assign.

Some of these things, like folder path in LF, is not required in the LST or XML file and can instead be set within the import profile, so it would be a good idea to find a balance between what you configure and what the vendor provides to keep maintenance costs lower if you might ever need to change something.

1 0
replied on August 23, 2017

Just want to note that a lst file is just a text file with certain syntax.  The question is if the syntax used by the scan agent is LF compatible for use with import agent.  If not, then you would still need to create some sort of workflow to "look up" the information and apply it to the imported documents.

0 0
replied on August 23, 2017

Yes, Bert makes an important point. Personally, I prefer using XML instead of LST because the formatting and parameters are a bit more explicit.

0 0
You are not allowed to follow up in this post.

Sign in to reply to this post.