Hello,
Has anyone built a workflow they can share with me which can be run on an already scanned document to get rid of the blank pages included in the document from when it was scanned? Thank you!
Dawn
Hello,
Has anyone built a workflow they can share with me which can be run on an already scanned document to get rid of the blank pages included in the document from when it was scanned? Thank you!
Dawn
Hi there,
This can be done with a script:
01 | Using oLFPageInfoReader As PageInfoReader = oLFDocInfo.GetPageInfos() |
02 | For Each oLFPageInfo As Laserfiche.RepositoryAccess.PageInfo In oLFPageInfoReader |
03 | If oLFPageInfo.ImageDepth = 1 And oLFPageInfo.ImageDataSize>2000 Then |
04 | newImage.add(oLFPageInfo.ReadPagePart) |
05 | else If oLFPageInfo.ImageDepth > 1 And oLFPageInfo.ImageDataSize>9000 |
06 | newImage.add(oLFPageInfo.ReadPagePart) |
07 | end if |
08 | end |
09 | end using |
10 | document.image=newImage |
Or something like that. Blank B&W images are smaller than blank colour images, hence checking the size twice.
Checking for size isn't a perfect method. I think Quick Fields defaults to assuming 3Kb is blank but a bad scanner or a dirty glass will produce a blank-looking page larger than that. I suspect you know that of course!
As Ben stated and somewhat showed, you can check the PageInfo for file size and remove pages that are smaller than a predefined amount. You must be careful when doing this that you do not set the value too high and then have it delete pages that have a small amount of data on them. Here is a simple workflow that if the image size for Black and White image is less than the value of BWBlankMax or not Black and White image is less than OtherBlankMax value, will delete the page. This will get some blanks but if the page is a dirty scan (a lot of speckling), has holes punched, ragged edge, or boarder the size will be above the threshold and the page will not be deleted.
Here is the code for a VB SDK Script
01 | Protected Overrides Sub Execute() |
02 | 'Write your code here. The BoundEntryInfo property will access the entry, RASession will get the Repository Access session |
03 | If BoundEntryInfo.EntryType = EntryType.Document Then |
04 | Dim sBWBlankMax As String = GetTokenValue( "BWBlankMax" ) |
05 | Dim sOtherBlankMax As String = GetTokenValue( "OtherBlankMax" ) |
06 | Dim iBWBlankMax As Integer |
07 | Dim iOtherBlankMax As Integer |
08 | ' Get Black and White Blank Page size value |
09 | If Integer .TryParse(sBWBlankMax, iBWBlankMax) Then |
10 | ' Get non Black and White Blank Page size value |
11 | If Integer .TryParse(sOtherBlankMax, iOtherBlankMax) Then |
12 | Try |
13 | ' Cast the Bound Entry to DocumentInfo |
14 | Using oLFDocInfo As DocumentInfo = DirectCast (BoundEntryInfo, DocumentInfo) |
15 | ' Get PageInfoReader with pages from DocumentInfo |
16 | Using oLFPageInfoReader As PageInfoReader = oLFDocInfo.GetPageInfos() |
17 | ' Iterate through each PageInfo |
18 | For Each oLFPageInfo As Laserfiche.RepositoryAccess.PageInfo In oLFPageInfoReader |
19 | 'Check for Black and White and image size |
20 | If oLFPageInfo.ImageDepth = 1 And oLFPageInfo.ImageDataSize < iBWBlankMax Then |
21 | ' Delete pages that are saller image size then set point |
22 | oLFPageInfo.Delete() |
23 | 'Check for Non Black and White and image size |
24 | ElseIf oLFPageInfo.ImageDepth > 1 And oLFPageInfo.ImageDataSize < iOtherBlankMax Then |
25 | ' Delete pages that are saller image size then set point |
26 | oLFPageInfo.Delete() |
27 | End If |
28 | ' Save any changes to the PageInfo object |
29 | oLFPageInfo.Save() |
30 | Next |
31 | End Using |
32 | End Using |
33 | Catch ex As Exception |
34 | ' Report error message |
35 | WorkflowApi.TrackWarning(ex.Message) |
36 | End Try |
37 | End If |
38 | End If |
39 | End If |
40 | End Sub |
I wish this were possible, and I hope I am wrong and that it is possible. The issue is that Workflow can't tell when a page is blank, Quick Fields can because it processes images but Workflow doesn't look at the image pages of the documents.
We have tried to accomplish this in the past by running OCR on documents to see if any pages had no OCR text, but it wasn't reliable enough to delete the pages without worrying that we were deleting pages with logos, etc.
If you are ok with that level of certainty, using DCC to OCR the documents then creating a workflow to delete pages without any OCR text might work for you.