You are viewing limited content. For full access, please sign in.

Question

Posted to Higher Education

Question

Workflow which removes blank pages from an already scanned document

asked on March 2, 2021

Hello,

Has anyone built a workflow they can share with me which can be run on an already scanned document to get rid of the blank pages included in the document from when it was scanned?  Thank you!

 

Dawn

0 0

Replies

replied on March 2, 2021 Show version history

Hi there,

This can be done with a script:

01Using oLFPageInfoReader As PageInfoReader = oLFDocInfo.GetPageInfos()
02    For Each oLFPageInfo As Laserfiche.RepositoryAccess.PageInfo In oLFPageInfoReader
03       If oLFPageInfo.ImageDepth = 1 And oLFPageInfo.ImageDataSize>2000 Then
04           newImage.add(oLFPageInfo.ReadPagePart)
05       else If oLFPageInfo.ImageDepth > 1 And oLFPageInfo.ImageDataSize>9000
06           newImage.add(oLFPageInfo.ReadPagePart)
07      end if
08    end
09end using
10document.image=newImage

Or something like that. Blank B&W images are smaller than blank colour images, hence checking the size twice.

Checking for size isn't a perfect method. I think Quick Fields defaults to assuming 3Kb is blank but a bad scanner or a dirty glass will produce a blank-looking page larger than that. I suspect you know that of course!

1 0
replied on March 3, 2021

Ben, thanks for that.  I had forgotten about using scripting for page size.  I think that would work great and if you set the page size small enough then you won't accidentally delete good pages. 

0 0
replied on March 3, 2021

As Ben stated and somewhat showed, you can check the PageInfo for file size and remove pages that are smaller than a predefined amount.  You must be careful when doing this that you do not set the value too high and then have it delete pages that have a small amount of data on them.  Here is a simple workflow that if the image size for Black and White image is less than the value of BWBlankMax or not Black and White image is less than OtherBlankMax value, will delete the page.  This will get some blanks but if the page is a dirty scan (a lot of speckling), has holes punched, ragged edge, or boarder the size will be above the threshold and the page will not be deleted.

Here is the code for a VB SDK Script

01Protected Overrides Sub Execute()
02    'Write your code here. The BoundEntryInfo property will access the entry, RASession will get the Repository Access session
03    If BoundEntryInfo.EntryType = EntryType.Document Then
04        Dim sBWBlankMax As String = GetTokenValue("BWBlankMax")
05        Dim sOtherBlankMax As String = GetTokenValue("OtherBlankMax")
06        Dim iBWBlankMax As Integer
07        Dim iOtherBlankMax As Integer
08        ' Get Black and White Blank Page size value
09        If Integer.TryParse(sBWBlankMax, iBWBlankMax) Then
10            ' Get non Black and White Blank Page size value
11            If Integer.TryParse(sOtherBlankMax, iOtherBlankMax) Then
12                Try
13                    ' Cast the Bound Entry to DocumentInfo
14                    Using oLFDocInfo As DocumentInfo = DirectCast(BoundEntryInfo, DocumentInfo)
15                        ' Get PageInfoReader with pages from DocumentInfo
16                        Using oLFPageInfoReader As PageInfoReader = oLFDocInfo.GetPageInfos()
17                            ' Iterate through each PageInfo
18                            For Each oLFPageInfo As Laserfiche.RepositoryAccess.PageInfo In oLFPageInfoReader
19                                'Check for Black and White and image size
20                                If oLFPageInfo.ImageDepth = 1 And oLFPageInfo.ImageDataSize < iBWBlankMax Then
21                                    ' Delete pages that are saller image size then set point
22                                    oLFPageInfo.Delete()
23                                    'Check for Non Black and White and image size
24                                ElseIf oLFPageInfo.ImageDepth > 1 And oLFPageInfo.ImageDataSize < iOtherBlankMax Then
25                                    ' Delete pages that are saller image size then set point
26                                    oLFPageInfo.Delete()
27                                End If
28                                ' Save any changes to the PageInfo object
29                                oLFPageInfo.Save()
30                            Next
31                        End Using
32                    End Using
33                Catch ex As Exception
34                    ' Report error message
35                    WorkflowApi.TrackWarning(ex.Message)
36                End Try
37            End If
38        End If
39    End If
40End Sub

1 0
replied on March 2, 2021

I wish this were possible, and I hope I am wrong and that it is possible. The issue is that Workflow can't tell when a page is blank, Quick Fields can because it processes images but Workflow doesn't look at the image pages of the documents.  

 

We have tried to accomplish this in the past by running OCR on documents to see if any pages had no OCR text, but it wasn't reliable enough to delete the pages without worrying that we were deleting pages with logos, etc. 

 

If you are ok with that level of certainty, using DCC to OCR the documents then creating a workflow to delete pages without any OCR text might work for you. 

0 0
You are not allowed to follow up in this post.

Sign in to reply to this post.