You are viewing limited content. For full access, please sign in.

Question

Question

Create XML Import File(s) From CSV

asked on February 6, 2024

Hello,

I have a large number of PDF files to import into Laserfiche, several thousand.

I have a template created with meta data fields to hold information about each PDF file, employee first name, last name, ID, DOB, etc. I have this data currently held in a CSV file.

I'm familiar with the process to use the Import Agent to utilize XML files in order to populate the template data and load the PDFs from my local drive to Laserfiche Cloud. I've performed this process multiple times when a professional services vendor mass scanned documents and provided the PDFs. 

This time, I am migrating PDF files from a previous document management system to Laserfiche.

My question is this - is there a guide or any detailed information available to walk through how to take the data I have in the CSV file that is needed to populate the template and get it into a workable XML file that I can use for importing everything? I know that I can either use individual XML files per PDF file or one XML file containing information about all the PDF files. Where I'm a bit stuck is how to generate the line items and tags from the CSV in the proper format the Import Agent requires. I can create a modify the data manually in an XML file, but that's not practical with I have thousands of files to load and thousands of line items in the CSV.

Thanks in advance for any references and help.

 

0 0

Replies

replied on February 6, 2024

Not sure if this helps you a little bit

 

https://www.convertcsv.com/csv-to-xml.htm

 

 

Key: google "convert csv to xml"

 

0 0
replied on February 6, 2024

But for specific formatted xml,  you'll need do some coding work either use your own programing tool or write in workflow script. 

0 0
replied on February 7, 2024 Show version history

Hi Mike,

Besides xml, Import Agent also supports import via two Laserfiche-specific formats that allow you to import multiple documents at once: list files and briefcase files, as mentioned in https://doc.laserfiche.com/laserfiche.documentation/en-us/Default.htm#Import-Agent-Options.htm.

A lst file could be an option here, when processing each row for the csv, generating a lst file following the schema.  The schema definition file and sample XML files included in the installation directory (e.g., C:\Program Files\Laserfiche\Import Agent\List File Examples). A lst file is similar to a text file, and you could refer to https://stackoverflow.com/questions/36858710/create-text-files-for-every-row-in-an-excel-spreadsheet-with-data-from-different about how to manipulate on each row of the csv and create a file, then write content in the created file.

0 0
replied on February 12, 2024

Thanks for the feedback. This confirms my thoughts that a bit more customization is going to be needed to get this working. I appreciate the suggestions and will see what the best way is to get the formatting correct for our import.

0 0
replied on February 13, 2024 Show version history

For example files you can use for the Import Agent, go to this file location on the computer your Import Agent is running on: 

C:\Program Files\Laserfiche\Import Agent\List File Examples

Specifically, you'd probably want to reference the file called "Template Population Example.lst".

 

I recently migrated ~1.5 million documents from another ECM system into Laserfiche. My approach was a bit different than what you might expect, but maybe it will help you think about other options.

I first wrote a PowerShell script that would parse the CSV files that contained the metadata into datatables, then create SQL lookup tables for each repository and bulk insert the datatables into those SQL lookup tables.

I created an Import Agent profile that would save the file path as a metadata field and set a tag that my workflow could use as a trigger.

Since some of the file information was contained in the file path itself, I created a workflow that would parse the file path into variables that I could use to then query SQL for each document's associated metadata, then apply that metadata to the document and remove the tag.

I was able to prevent bogging down my workflow server and limit the number of simultaneously-running workflows by making this workflow an hourly scheduled workflow that would run a search for the first (X) number of documents that were missing metadata but had the previously set tag applied. It had a deadline of 55 minutes, so it would never run longer than an hour and overlap with the next scheduled workflow. It worked out perfectly and I was able to fully import and apply the metadata to all of the documents within a month.

Maybe it would've been easier or more efficient to write a script to parse the CSVs into XML files for each of the documents so that the Import Agent could apply the metadata, but since I am very comfortable with PowerShell, SQL, and workflows, it was easier for me to do it this way.

0 0
You are not allowed to follow up in this post.

Sign in to reply to this post.