Question

Use Workflow to Identify Document Volume and File Path

Workflow Version 10

Updated July 5, 2017

asked on June 7, 2017

We are looking to begin deleting and removing old entries from our repository. For audit purposes, we need to identify the location of the document and insert the data into a SQL table.

We are using workflow to locate the documents, insert the data into SQL, and then delete the entries after that. Locating the location/file path in the repository is easy enough, but we are hoping to use workflow to identify the Volume and File Path of the document in the server. (The purpose of this is in case we find that we need to restore a document, we can go into our long term backups, and restore that one exact file to the server itself).

I can see the Volume and File Path by going to the document itself in the repository:

For Tiffs or other image files, the file path can be seen under the page info:

Would anyone have suggestions on how to find this information using Workflow? (Assuming it is even possible)

Thank you for any help or suggestions.

We are using 10.2.1

0 0

Replies

replied on June 7, 2017 • Show version history

Evan,

I don't believe this information is readily available using any built-in workflow activities. However, you should be able to grab it all relatively easily using SDK scripts within Workflow.

The general attributes you would probably want can be obtained using something like this:

// This will provide access to document info and volume name
DocumentInfo doc = Document.GetDocumentInfo(entryId, session);

// This is needed for page info
PageInfo pgInfo = doc.GetPageInfo(1);

// This is needed to find the path of Electronic Documents
DocumentStatistics docStats = doc.GetStatistics();

string volumeName = doc.VolumeName;
string pagePath = pgInfo.ImageFilePath;
string edocPath = docStats.ElecDocumentPath;

Robert posted about getting eDoc path here. To get the path of each individual page however, you would need to use .PageInfos instead of .PageInfo, set the page range, then iterate through each page to get its path.

The results for both Path attributes should include the file name and the extension, so combining that with the Volume name should give you most of the path (except the drive letter or server).

If you need to take it a step further and get the actual drive location of the volume:

// Use the volumeName variable obtained in the previous example
VolumeInfo vInfo = Volume.GetInfo(volumeName);

// If your volumes have fixed paths
string FixedPath = vInfo.FixedPath;

// If your volumes have removable paths
string RemovablePath = vInfo.RemovablePath;

1 0

replied on June 30, 2017

Sorry for the delayed reply.

When you are running this sdk, how is it that you are setting the entryId and session variables? Or where is it that you are retrieving them from?

Thanks

0 0

View 2 previous replies

replied on June 30, 2017 • Show version history

If you are running this in a Workflow SDK script, you set a target entry in the designer like you would for other activities, and then "BoundEntryID" will give you access to that entry's ID in the SDK Script (save it to a variable, or use it directly).

int entryID = BoundEntryID;

For the session, in a Workflow SDK script you can just use RASession and it will use whatever connection profile is associated with the workflow (or the one specified for the activity if you have more than one profile in the workflow).

DocumentInfo doc = Document.GetDocumentInfo(BoundEntryID, RASession);

Just make sure you add the appropriate references in the Script Editor:

Laserfiche.DocumentServices
Laserfiche.RepositoryAccess

0 0

replied on June 30, 2017 • Show version history

You don't need to get the entry yourself, Workflow already has it.

DocumentInfo doc = (DocumentInfo) this.BoundEntryInfo;

1 0

replied on June 30, 2017

Thanks Miruna! I totally missed that one.

0 0

replied on July 3, 2017

I believe I've modified the script as you have recommended. Although, I'm not 100% as I've not worked with scripts like this before.

// This will provide access to document info and volume name
DocumentInfo doc = (DocumentInfo) this.BoundEntryInfo;

// This is needed for page info
PageInfoReader pgInfo = doc.GetPageInfos();

// This is needed to find the path of Electronic Documents
DocumentStatistics docStats = doc.GetStatistics();

string volumeName = doc.VolumeName;
string pagePath = pgInfo.Item.ImageFilePath;
string edocPath = docStats.ElecDocumentPath;

When I run the script to test it, I select a file to test the process on, and then run it, and get this message:

Any suggestions? Thanks for the help and patience.

0 0

replied on July 3, 2017 • Show version history

Evan,

The problem you're running into there is that you're retrieving PageInfos, which is a Collection representing all of the pages but the code is treating it like an individual page.

If you grab an entire collection like that, you need to iterate through each page one-by-one to get the path/file name of the individual pages and collect them as you go.

You can do this in more than one way, but here is an example of one method that works:

// access document information
DocumentInfo doc = (DocumentInfo) this.BoundEntryInfo;

// returns a Collection of pages even it is only one page
// so the variable name is somewhat inaccurate/misleading
//PageInfoReader pgInfo = doc.GetPageInfos(); 

// this is a more accurate variable name
PageInfoReader pageReader = doc.GetPageInfos(); 

// for finding path of electronic documents
DocumentStatistics docStats = doc.GetStatistics();

string volumeName = doc.VolumeName;

// this will not work with a collection
//string pagePath = pgInfo.Item.ImageFilePath;

string eDocPath = docStats.ElecDocumentPath;

/*****************************************************/
/*************** CHANGES START HERE ******************/

// You need an array to store info for multiple pages
string[] pagePaths = new String[doc.PageCount];

// Iterate through each page in the collection
// This is where you'll actually get the page info
foreach(PageInfo pgInfo in pageReader){
    // save the current page information to the array created earlier
    // page.PageNumber-1 is important (array indexes start at 0, not 1)
    pagePaths[pgInfo.PageNumber - 1] = pgInfo.ImageFilePath;
}

// Creates a multi-value token with every path saved in the loop
SetMultiValueToken("Page Paths",pagePaths,true);

0 0

replied on July 3, 2017

Side question: Why do you need this extra table? You can restore the SQL database and look at the toc and doc tables to get the volume location then use those to look in your backups. Also, the location can change if the document is migrated from one volume to another.

0 0

View 1 previous reply

replied on July 3, 2017 • Show version history

Miruna brings up a good point. If you need to any of the metadata in addition to the images/pages (creation date, document name, etc.), then tracking the file path is not going to be enough to recover the deleted entries.

Currently all you would be getting is a bunch of single-page tiff images and/or electronic documents with no file extension (on the volume eDocs are generic "File" types). So restoring everything would take a lot more effort than you might be expecting.

0 0

replied on July 3, 2017

I think the simple summary as to why we're looking to find these specific files is in the event that we need to restore a file, it makes far more sense to be able to reference the location of the image files, and pick and choose which to restore, even if that means assembling the pages together manually, and manually re-entering the metadata.

To us, that is a more preferable option than restoring 100+ Gigs of a database, just to find a small handful of files.

0 0

replied on July 3, 2017 • Show version history

I suppose that makes sense, but be aware that you're going to lose any annotations, and you'll need to fix the extensions for your electronic documents.

Another option to explore is the BriecaseExporter class in an SDK Script to create a Laserfiche-ready "backup" file with the following benefits:

Everything can be packaged up in a single file
- metadata, pages, electronic documents, annotations, etc.
Straightforward import
- Drag and drop in the client
- Import Agent
- Workflow/SDK Script
Fewer changes to the LF Entry after recovery
- A new EntryID and Owner/Creator, but
- Options to preserve folder structure
- Creation date, etc., remain the same

This can be accomplished with the following code:

// Set the briefcase output path and file name
// Choose a path the service account can access
string outputPath = @"\\servername\C$\Temp\";
string fileName = this.BoundEntryInfo.Name + ".lfb";
            
// Initialize the briefcase exporter
BriefcaseExporter bcExp = new BriefcaseExporter(RASession);

// Add your entry
bcExp.AddEntry(this.BoundEntryId);

// set the path of the output file
bcExp.Export(outputPath + fileName);

// access document information
DocumentInfo doc = (DocumentInfo) this.BoundEntryInfo;

You can do a briefcase for each file, export search results and in batches, etc.

0 0

replied on July 5, 2017

The other thing to consider is the traffic this workflow would generate on the repository since you'd need it to check all changes to see if the "backup" database needs to be updated. You'd also need to track all deletions as well (though currently Workflow can't distinguish between the entry being sent to the recycle bin and the entry being purged from the recycle bin).

0 0

You are not allowed to follow up in this post.

Question

Question

Use Workflow to Identify Document Volume and File Path

Replies

Sign in to reply to this post.