You are viewing limited content. For full access, please sign in.

Question

Question

Workflow: How to check a file and its different versions to see if the document found is a new version or old

asked on April 3, 2014

 Hi Laserfiche Staff and community,

 

This is a two part question, but it is linked to a similar problem domain.

 

The problem is: Build workflow that runs when documents are placed in certain locations, then the file is searched for in other locations in the repository (based on its file-name); when file(s) of same name is/are found in a specific directory, workflow will compare the found file to see if the files found are identical to the starting entry. If they are not then it will check the subsequent versions. If the starting entry matches with any of the versions of the found document, then it is deleted. If no matches are found: then it is treated as a new version of the found document, therefore still preserving the version history on the file.

 

Question  Part 1: Is there a way (in workflow) to compare a file/document/tiff with another file/document/tiff and (if available) its various versions to see if a match is found?

 

Question Part 2: If a match is found between the starting entry and the found file, then can the file be treated as a new version of the found file, so not to preserve the version history of the file.

0 0

Replies

replied on January 6, 2016 Show version history

Good morning Rene,

You're getting the error because the DocumentInfo object doesn't contain the electronic document. It contains other objects that eventually contain the electronic document.

You would have to extract the file into a LaserficheReadStream, MemoryStream, FileStream or Byte Array using the DocumentInfo.ReadEDoc. However, if you going to create a hash for each document, then how about enabling "Enable Checksum" on the document Volume (through the LF Admin Console). A SHA1 checksum will be created for every document and is much easier to read rather than generate...
 

Try
    Dim MyDocument As DocumentInfo = me.BoundEntryInfo
    dim strMimeType as String=""
    dim oLFReadStream as LaserficheReadStream
    oLFReadStream=MyDocument.ReadEdoc(strMimeType)
    SetToken("field_test",oLFReadStream.computedChecksum)
Catch ex As Exception
    MsgBox(ex.message)
    'Report any errors to workflow...
    WorkflowApi.TrackError(ex.message)
End Try

 

1 0
replied on January 6, 2016

Thanks Ben, for the help

I made it work by using this code below

It works only for laserfiche document that have electronic files like pdf or word document. But if there is no electronic image, like just a laserfiche tiff document. the hash code return always the same code, because I imagine it is created base on the electronic image and not the laserfiche image (tiff).

 

So I would need to find a way to read the laserfiche image when there is no electronic image.

 

ReadEdoc  seems to read the electronic image, what about the laserfiche (tiff) image?

 

Try
    Dim MyDocument As DocumentInfo = me.BoundEntryInfo
    dim strMimeType as String=""
    dim oLFReadStream as LaserficheReadStream

    oLFReadStream=MyDocument.ReadEdoc(strMimeType)
    'dim oLFReadStream2=oLFReadStream.computedChecksum

    dim hash = md5.create
    dim oHashBytes As Byte() = hash.ComputeHash(oLFReadStream)
    dim oGUID = New Guid(oHashBytes)
    dim strTmp = oGUID.ToString.Replace("-", "").ToUpper

    SetToken("field_test",strTmp)
    'SetToken("field_test",oLFReadStream.computedChecksum)

Catch ex As Exception
    MsgBox(ex.message)
    'Report any errors to workflow...
    WorkflowApi.TrackError(ex.message)
End Try

 

0 0
replied on January 6, 2016

In that case, you could export the document first with docexporter or use MyDocument.AllPages and calculate the checksum that way. I'm not sure which is faster though.

1 0
replied on January 6, 2016

Thanks for the laserfiche image I use this one and it works

By using your help

 

so next I will do a condition to look it is a electronic files or just laserfiche image and will use different vb code to get it works

Try
    'Dim MyDocument As DocumentInfo = me.BoundEntryInfo
    Dim docInfo as DocumentInfo = DirectCast(Me.BoundEntryInfo, DocumentInfo)
    'dim strMimeType as String=""
   ' dim oLFReadStream as LaserficheReadStream

   Dim exporter As New DocumentExporter()
  'Initialize the byte array, memory stream, and reader...
   Dim image() As Byte
   Dim stream As MemoryStream = New MemoryStream()
   Dim reader As BinaryReader = New BinaryReader(stream)

  'Set the exported image format to tif...
   exporter.PageFormat = DocumentPageFormat.Tiff

  'Export the image to the memory stream...
   exporter.ExportPages(docInfo, docInfo.AllPages, stream)

  'Read the image stream into the byte array...
   image = reader.ReadBytes(stream.Length)


    'oLFReadStream=MyDocument.ReadEdoc(strMimeType)
    'dim oLFReadStream2=oLFReadStream.computedChecksum

    dim hash = md5.create
    dim oHashBytes As Byte() = hash.ComputeHash(image)
    dim oGUID = New Guid(oHashBytes)
    dim strTmp = oGUID.ToString.Replace("-", "").ToUpper

    SetToken("field_test",strTmp)
    'SetToken("field_test",oLFReadStream.computedChecksum)

Catch ex As Exception
    MsgBox(ex.message)
    'Report any errors to workflow...
    WorkflowApi.TrackError(ex.message)
End Try

 

0 0
replied on January 6, 2016

Glad it's working.

Will you be at the conference? I'd like to chat about there are a number of opportunities to consider...

1. A users stores an edoc then generates pages. Would you just checksum the edoc?

2. Recalculating the version when an edoc is updated.

3. Recalculating the pages for a document when they are moved around.

4. And so many more, I'm sure!

-Ben

Skype: ben.birns
LinkedIn: uk.linkedin.com/in/benbirns
 

 

0 0
replied on January 6, 2016

I were there last year, and I pass my turn this year, wish I will be there next year.

Presently, we are in RFP for a city hall and they one of their requests is to manage duplicated files.

 

yes we have a lot to consider, but I I thought to do it when the files are imported in laserfiche and when it move to the final destination ( a record serie folder)

1 0
replied on January 7, 2016

I've been setting up a lot of transparent RM systems so they are in a file series the moment they have metadata... That's probably why I have extra issues to consider...  

0 0
replied on April 7, 2014

Hi Farhan,

 

I'm curious about your use-case but it sounds like you're trying to a solution that manages duplicates in Laserfiche. Hope this works out!

 

Here is an approach I would try. My main concern is the potential workload on the server, depending on the rate of document creation and modification.

 

For simplicity I would only check the new document (or revision) against the current revision of all documents on the system, not all versions of every document. 

 

I would create a workflow that does the following:

  1. Triggers when a document is created or updated
  2. Generate an MD5 hash or a GUID of an MD5 hash. with SDK Script (see below)
  3. Run a standard field-search across the repository, to see if the GUID exists on the template of any existing document (a hidden fields called "MD5 Hash" for example
  4. If the GUID can't be found, assign the GUID to the new document's template, to a field called "MD5 Hash"
  5. If a GUID can be found, then this document is a duplicate and do something about it.

 

How does that sound?

 

'First create some code to extract the electronic document into Memory Stream object
'Then write the Memory Stream object to a byte array.

oHashBytes = Md5.ComputeHash(oElecBytes)
oGUID = New Guid(oHashBytes)
strTmp = oGUID.ToString.Replace("-", "").ToUpper

'that's all you need to create a unique identifier for a document's contents. I convert the MD5 hash into a GUID to ensure a fixed hash size of 32 characters because I like consistency of length of information for storing into Laserfiche fields.
  1. Check the template for a hidden field called "GUID Hash"

 

0 0
replied on July 22, 2014

Hi Farhan,

 

Curious to know if you managed to get the above working within workflow as I am trying to code something up at the moment and would be a great help if you have already implemented something.

 

Many thanks

Tina

0 0
replied on January 5, 2016

I have tried with this code but get these errors

any ideas.

the goal is to get an md5 hash code for each new entry file of any type to eventually find duplicate files. 

Try

   'Dim MyDocument As DocumentInfo = Me.BoundEntryInfo

Dim MyDocument As Laserfiche.RepositoryAccess.IDocumentContents = me.BoundEntryInfo

Dim utf8 As Encoding = Encoding.UTF8
Dim tbytes As Byte() = utf8.GetBytes(MyDocument)

dim hash = md5.create
dim oHashBytes As Byte() = hash.ComputeHash(tbytes)
dim oGUID = New Guid(oHashBytes)
dim strTmp = oGUID.ToString.Replace("-", "").ToUpper

SetToken("field_test", strTmp)

Catch ex As Exception

    'Report any errors to workflow...
    WorkflowApi.TrackError(ex.message)

End Try

 

0 0
You are not allowed to follow up in this post.

Sign in to reply to this post.