Question

Higher Education

Workflow which removes blank pages from an already scanned document

Laserfiche Workflow

Updated March 3, 2021

asked on March 2, 2021

Hello,

Has anyone built a workflow they can share with me which can be run on an already scanned document to get rid of the blank pages included in the document from when it was scanned? Thank you!

Dawn

0 0

Replies

replied on March 2, 2021 • Show version history

Hi there,

This can be done with a script:

Using oLFPageInfoReader As PageInfoReader = oLFDocInfo.GetPageInfos()
    For Each oLFPageInfo As Laserfiche.RepositoryAccess.PageInfo In oLFPageInfoReader
       If oLFPageInfo.ImageDepth = 1 And oLFPageInfo.ImageDataSize>2000 Then
           newImage.add(oLFPageInfo.ReadPagePart)
       else If oLFPageInfo.ImageDepth > 1 And oLFPageInfo.ImageDataSize>9000
           newImage.add(oLFPageInfo.ReadPagePart)
      end if
    end 
end using
document.image=newImage

Or something like that. Blank B&W images are smaller than blank colour images, hence checking the size twice.

Checking for size isn't a perfect method. I think Quick Fields defaults to assuming 3Kb is blank but a bad scanner or a dirty glass will produce a blank-looking page larger than that. I suspect you know that of course!

1 0

replied on March 3, 2021

Ben, thanks for that. I had forgotten about using scripting for page size. I think that would work great and if you set the page size small enough then you won't accidentally delete good pages.

0 0

replied on March 3, 2021

As Ben stated and somewhat showed, you can check the PageInfo for file size and remove pages that are smaller than a predefined amount. You must be careful when doing this that you do not set the value too high and then have it delete pages that have a small amount of data on them. Here is a simple workflow that if the image size for Black and White image is less than the value of BWBlankMax or not Black and White image is less than OtherBlankMax value, will delete the page. This will get some blanks but if the page is a dirty scan (a lot of speckling), has holes punched, ragged edge, or boarder the size will be above the threshold and the page will not be deleted.

Here is the code for a VB SDK Script

        Protected Overrides Sub Execute()
            'Write your code here. The BoundEntryInfo property will access the entry, RASession will get the Repository Access session
            If BoundEntryInfo.EntryType = EntryType.Document Then
                Dim sBWBlankMax As String = GetTokenValue("BWBlankMax")
                Dim sOtherBlankMax As String = GetTokenValue("OtherBlankMax")
                Dim iBWBlankMax As Integer
                Dim iOtherBlankMax As Integer
                ' Get Black and White Blank Page size value
                If Integer.TryParse(sBWBlankMax, iBWBlankMax) Then
                    ' Get non Black and White Blank Page size value
                    If Integer.TryParse(sOtherBlankMax, iOtherBlankMax) Then
                        Try
                            ' Cast the Bound Entry to DocumentInfo
                            Using oLFDocInfo As DocumentInfo = DirectCast(BoundEntryInfo, DocumentInfo)
                                ' Get PageInfoReader with pages from DocumentInfo
                                Using oLFPageInfoReader As PageInfoReader = oLFDocInfo.GetPageInfos()
                                    ' Iterate through each PageInfo
                                    For Each oLFPageInfo As Laserfiche.RepositoryAccess.PageInfo In oLFPageInfoReader
                                        'Check for Black and White and image size
                                        If oLFPageInfo.ImageDepth = 1 And oLFPageInfo.ImageDataSize < iBWBlankMax Then
                                            ' Delete pages that are saller image size then set point
                                            oLFPageInfo.Delete()
                                            'Check for Non Black and White and image size
                                        ElseIf oLFPageInfo.ImageDepth > 1 And oLFPageInfo.ImageDataSize < iOtherBlankMax Then
                                            ' Delete pages that are saller image size then set point
                                            oLFPageInfo.Delete()
                                        End If
                                        ' Save any changes to the PageInfo object
                                        oLFPageInfo.Save()
                                    Next
                                End Using
                            End Using
                        Catch ex As Exception
                            ' Report error message
                            WorkflowApi.TrackWarning(ex.Message)
                        End Try
                    End If
                End If
            End If
        End Sub

1 0

replied on March 2, 2021

I wish this were possible, and I hope I am wrong and that it is possible. The issue is that Workflow can't tell when a page is blank, Quick Fields can because it processes images but Workflow doesn't look at the image pages of the documents.

We have tried to accomplish this in the past by running OCR on documents to see if any pages had no OCR text, but it wasn't reliable enough to delete the pages without worrying that we were deleting pages with logos, etc.

If you are ok with that level of certainty, using DCC to OCR the documents then creating a workflow to delete pages without any OCR text might work for you.

0 0

You are not allowed to follow up in this post.

Question

Question

Workflow which removes blank pages from an already scanned document

Replies

Sign in to reply to this post.