posted on January 21, 2018 Show version history

We recently ran into an audit process that we do here involving Laserfiche and briefcases. What we needed was a way to extract the filenames from a briefcase before they were imported into Laserfiche. The briefcases were sent to us from a scanning service.

 

We were able to write a script, in Python, that is able to extract file names from a briefcase before they are imported into Laserfiche, here is the script below.

 

import tarfile
import os
import xml.etree.ElementTree as ET

if __name__ == "__main__":
    files = os.listdir('.')
    for file in files:
        if file.endswith(".lfb"):
            filenamebase = file.split(".lfb")
            filenamenew = filenamebase[0]+".tar.gz"
            if not os.path.exists(filenamebase[0]):
                os.makedirs(filenamebase[0])
                os.popen('copy %s %s'%(file,filenamenew))
                os.rename(filenamenew,filenamebase[0]+"\\"+filenamenew)
                os.chdir(filenamebase[0])
                tar = tarfile.open(filenamenew)
                tar.extractall()
                tar.close()
                tree = ET.parse('toc.xml')
                root = tree.getroot()
                for child in root:
                    childattrib = child.attrib
                    with open(filenamebase[0]+".txt", 'a') as namefile:
                        namefile.write(childattrib["name"]+"\n")
                        namefile.close()
                os.chdir("..")

 

What the script does is this.

#1. Copies the Briefcase as a GZIP file and extracts it
#2. It moves everything into a folder that has the name of the briefcase
#3. It finds the extracted file, "toc.xml", and parses the XML file. Everytime it parses a filename, it writes that filename to a txt file. The txt filename will be the same the briefcase.
 

You may need to write a clean up portion part of the script that only leaves behind the text file. We just did this using system commands since most briefcases extract as readonly, and left behind the text file only in the folder named as the briefcase.

I don't know if anyone else has ran into this issue, so I just wanted to post a method we made that worked for us.

4 0