Does anyone have a workflow that will remove blank pages from a multipage tiff and reassemble doc minus blank pages. Quickfields is not an option.
Question
Question
Answer
William,
Here is some code for an SDK script to delete 'blank' pages from the workflow starting entry. NOTE: this script looks at the page text and if there is no text then it will delete the page. The obvious downside to this approach is that if the document has not been OCR'd then all of the document pages will be deleted!
A more robust solution I think would be to look at the image size and set a threshold to delete pages (images) that are smaller than that threshold.
Later edit: The option to look at the image size versus the text size is only a single line of code so I added that to the code below and commented it out. I arbitrarily set that limit at 3000 bytes...
Protected Overrides Sub Execute() Try 'Instantiate a document object and set it to the workflow starting entry... Dim document as LFDocument = Me.Entry 'Instantiate a pages object and set it to the document pages... Dim docPages As LFDocumentPages = document.Pages 'Since we will be stepping through the document and marking the 'blank' 'pages lets make sure all pages are unmarked... docPages.UnmarkAllPages() 'Now step through the document pages and look at the text object 'If the text object has a length of 0 then mark it for deletion... For Each page as LFPage in docPages 'Replace the TextSize property with the ImageSize property to 'look at image size versus page text size. i.e. If page.ImageSize < 3000 Then If page.TextSize = 0 Then docPages.MarkPage(page) End If Next 'Lock the object in preparation to delete the pages... docPages.LockObject(Lock_Type.LOCK_TYPE_WRITE) 'Delete the pages... docPages.DeleteMarkedPages() 'Save the updated page object and unlock... docPages.Update() docPages.UnlockObject() 'Cleanup... document = Nothing docPages = Nothing Catch End Try End Sub
Replies
William,
Interesting question; if I were going to do this in workflow it would be in an SDK script. The immediate issue is determining which pages are 'blank'. My first thought would be that if the document was OCR'd then I would look at the text object for that page and if it was empty then delete it. The second thought would be to look at the image size (in bytes) and set a threshold that if the image is less than this number of bytes then delete it (perhaps 3K?)
In either case I would probably step through the document pages first and build a PageSet object of the pages to delete and then make a single call to the DocumentInfo.DeletePage(PageSet) method.
If that would satisfy your needs then let me know and I can mock up some code snippets.
(Then again, I might have totally over-thought your question and someone else can provide an easier way to accomplish this!)
Hi Cliff,
You have it correct. I have been trying to do this within the bounds of Workflow designer using retrieve doc text etc.
I do not have experience with SDK script however I have some basic knowledge of .net programming and would be able to follow code snippets.
Much appreciated if you could provide some code snippets.
Cheers,
Bill
Workflow does not have any image processing capabilities. Like Cliff said, you could do it with a script, but a better tool for this type of job is Quick Fields.
Thanks for the reply Miruna however like I said in the question quick fields is not an option.
Using workflow I was able to create a workflow that removed blank pages from a collection of single page tiffs. Just used retrieve doc text and checked if token was empty or not. However ran into problems with multi page tiffs couldn't find a way to make workflow look at each individual page.
You can look at individual pages of an entry using a Repeat activity with the condition: Page count (of the entry) greater or equal %(Repeat iteration). The %(Repeat iteration) token is the current page number Inside the Repeat loop. Make sure you start the iteration token at 1.