You are viewing limited content. For full access, please sign in.

Question

Question

Workflow Script to Delete OCR\Extracted Document Text

asked on January 20

Hi everyone. I am working on a process where I need to OCR\Extract the text of incoming documents and then do some processing. At the end of the process the OCR\Extracted text is no longer needed, and I would like to delete it, so it frees up storage space.

I am curious if anyone has a Workflow script that does this that they would be willing to share?

0 0

Replies

replied on January 20 Show version history

I have the following script. I use the original to remove page images, but the process is similar, so I just changed it to remove the text instead.

            bool success = true;

            DocumentInfo doc = (DocumentInfo)this.BoundEntryInfo;
            doc.Lock(LockType.Exclusive);

            try{
                // Read values of each page
                PageInfoReader pageReader = doc.GetPageInfos();
                foreach(PageInfo page in pageReader){
                    // check if page has text and remove it
                    if(page.HasText){
                        page.ClearPagePart(PagePart.Text);
                    }
                }

                // Save changes
                doc.Save();
            }
            catch(Exception ex){
                // Flag for retry
                success = false;
                WorkflowApi.TrackError(ex.Message);
            }
            finally{
                // Release document
                doc.Unlock();
                doc.Dispose();

                // Set output
                SetTokenValue("Success",success);
            }

 

1 0
You are not allowed to follow up in this post.

Sign in to reply to this post.