Hello,
Has anyone built a workflow they can share with me which can be run on an already scanned document to get rid of the blank pages included in the document from when it was scanned? Thank you!
Dawn
Hello,
Has anyone built a workflow they can share with me which can be run on an already scanned document to get rid of the blank pages included in the document from when it was scanned? Thank you!
Dawn
Hi there,
This can be done with a script:
Using oLFPageInfoReader As PageInfoReader = oLFDocInfo.GetPageInfos()
For Each oLFPageInfo As Laserfiche.RepositoryAccess.PageInfo In oLFPageInfoReader
If oLFPageInfo.ImageDepth = 1 And oLFPageInfo.ImageDataSize>2000 Then
newImage.add(oLFPageInfo.ReadPagePart)
else If oLFPageInfo.ImageDepth > 1 And oLFPageInfo.ImageDataSize>9000
newImage.add(oLFPageInfo.ReadPagePart)
end if
end
end using
document.image=newImage
Or something like that. Blank B&W images are smaller than blank colour images, hence checking the size twice.
Checking for size isn't a perfect method. I think Quick Fields defaults to assuming 3Kb is blank but a bad scanner or a dirty glass will produce a blank-looking page larger than that. I suspect you know that of course!
As Ben stated and somewhat showed, you can check the PageInfo for file size and remove pages that are smaller than a predefined amount. You must be careful when doing this that you do not set the value too high and then have it delete pages that have a small amount of data on them. Here is a simple workflow that if the image size for Black and White image is less than the value of BWBlankMax or not Black and White image is less than OtherBlankMax value, will delete the page. This will get some blanks but if the page is a dirty scan (a lot of speckling), has holes punched, ragged edge, or boarder the size will be above the threshold and the page will not be deleted.
Here is the code for a VB SDK Script
Protected Overrides Sub Execute()
'Write your code here. The BoundEntryInfo property will access the entry, RASession will get the Repository Access session
If BoundEntryInfo.EntryType = EntryType.Document Then
Dim sBWBlankMax As String = GetTokenValue("BWBlankMax")
Dim sOtherBlankMax As String = GetTokenValue("OtherBlankMax")
Dim iBWBlankMax As Integer
Dim iOtherBlankMax As Integer
' Get Black and White Blank Page size value
If Integer.TryParse(sBWBlankMax, iBWBlankMax) Then
' Get non Black and White Blank Page size value
If Integer.TryParse(sOtherBlankMax, iOtherBlankMax) Then
Try
' Cast the Bound Entry to DocumentInfo
Using oLFDocInfo As DocumentInfo = DirectCast(BoundEntryInfo, DocumentInfo)
' Get PageInfoReader with pages from DocumentInfo
Using oLFPageInfoReader As PageInfoReader = oLFDocInfo.GetPageInfos()
' Iterate through each PageInfo
For Each oLFPageInfo As Laserfiche.RepositoryAccess.PageInfo In oLFPageInfoReader
'Check for Black and White and image size
If oLFPageInfo.ImageDepth = 1 And oLFPageInfo.ImageDataSize < iBWBlankMax Then
' Delete pages that are saller image size then set point
oLFPageInfo.Delete()
'Check for Non Black and White and image size
ElseIf oLFPageInfo.ImageDepth > 1 And oLFPageInfo.ImageDataSize < iOtherBlankMax Then
' Delete pages that are saller image size then set point
oLFPageInfo.Delete()
End If
' Save any changes to the PageInfo object
oLFPageInfo.Save()
Next
End Using
End Using
Catch ex As Exception
' Report error message
WorkflowApi.TrackWarning(ex.Message)
End Try
End If
End If
End If
End Sub
I wish this were possible, and I hope I am wrong and that it is possible. The issue is that Workflow can't tell when a page is blank, Quick Fields can because it processes images but Workflow doesn't look at the image pages of the documents.
We have tried to accomplish this in the past by running OCR on documents to see if any pages had no OCR text, but it wasn't reliable enough to delete the pages without worrying that we were deleting pages with logos, etc.
If you are ok with that level of certainty, using DCC to OCR the documents then creating a workflow to delete pages without any OCR text might work for you.