Greetings!
So, who is dealing with HUGE data?
We have an application which extracts data. The result is a folder with millions of small to large files sometimes totaling 500GB or more. Each of these is a record which needs to be saved together. When used, they need to be brought back down to the user and then opened up in a client application.
We need to save this folder, and have it wrapped in metadata, retention, audit trail, etc. otherwise it would probably live on a server filesystem somewhere.
I think we're fine with the metadata, volume, etc. handling, but in general having a hard time dealing with records of this size.
I did some POC tests and it isn't really user friendly to download everything as needed.
Discussion
Discussion
Dealing with HUGE data!
I think the big question is what is the user experience going to be like? A million files is not a lot for a Laserfiche system, but we do not recommend putting them all in the same folder for users to scroll through. So either they need to be in an understandable folder hierarchy, or they have to be indexed well enough that having search be the main discovery method is feasible.
What kind of documents are these? If they are file types that are handled natively by the web client (Office, pdf, image, media), then that should be a good user experience. If they need to download files to open with an installed application - which I think is the case in your scenario - that is going to be not as good. And you have to consider if downloading the whole batch of files to the local machine would be contrary to best practices in the audited environment you are trying to build.
Hi Michael - what exactly are these records and what's the overall goal here?
Hi Tessa!
These are forensic extractions. These will show up as a folder with all those files in it.
The access requirements are very tight - least access. The retention criteria need to be set and followed. Audit trail needs to be generated to show that no unauthorized access, etc. It would be great to wrap the metadata around it too.
For the end user, they need to be able to open up a file in this folder of files with a client application.
I'm even picturing having a zip file or something and using that as the record, with the process being to download the file and extract it locally, then deleting it when done. But this is a lot more manual process than I'd like.
Thoughts?
Hi Michael,
I have no experience with huge data, but had a few questions. It is always helpful when I bounce things off of others, so I hope you don't mind my questions.
It sounds like you are looking for this other application to drop the folder in Laserfiche so that Laserfiche can create its own folder and put all the files in it, as well as set the access and metadata.
How would the metadata/search be set up so that a user can search for files. Or are they going to need to look at the bulk of documents that are in the folder? And will they need that other application to view the documents because they are in a proprietary format?
From what I have experienced, a document management system can only track what is done in its application. If a file is opened outside of the system, and that application allows a user to save or print, or whatnot, none of that is kept track of...unless your other application keeps track of that.
Since you talk about an audit trail and retention, I don't know how saving it or viewing it outside of a system like Laserfiche would work. But, I understand your concern if Laserfiche can handle folders with millions of documents in it. And how the usability would work.
I know there are customers that have millions of documents in their system, but not sure about millions in individual folders. It is hard to visualize without knowing more about this other application and how your end users work.