You are viewing limited content. For full access, please sign in.

Question

Question

Workflow and SDK - Change document text on a page by page basis

asked on August 7, 2016

Hello,

I have a 24-bit depth tif document stored in the repository. As one would expect, OCRing such a file does not yield good results, so a monochrome (1-bit depth) document has been derived and OCRed into the repository. At that stage, I need to copy, through a workflow/SDK combination, the text (from the latter document) into the former document on a page by page basis.

Thanks in advance for any pointer

0 0

Answer

SELECTED ANSWER
replied on August 8, 2016

I'm sure someone else can point out a better way to accomplish what you want, but if you go the SDK route, here is how you would copy the text/location data from one document to another:


DocumentInfo srcDoc = Document.GetDocumentInfo(srcDocID, session);
DocumentInfo dstDoc = Document.GetDocumentInfo(dstDocID, session);

if (srcDoc.PageCount == dstDoc.PageCount)
{
    foreach (PageInfo srcPage in srcDoc.GetPageInfos())
    {
        PageInfo dstPage = dstDoc.GetPageInfo(srcPage.PageNumber);

        // Copy the page text
        using (var txtReader = srcPage.ReadTextPagePart())
            dstPage.WriteTextPagePart(txtReader.ReadToEnd());

        // Copy the word locations
        using (var locReader = srcPage.ReadLocationsPagePart())
        using (var locWriter = dstPage.WriteLocationsPagePart(srcPage.LocationsDataSize))
        {
            for (int i = 0; i < locReader.WordLocationCount; i++)
                locWriter.Write(locReader.Read());
        }

        dstPage.Save();
    }
}

 

0 0
replied on August 8, 2016

Many thanks Robert... it works great !

0 0

Replies

You are not allowed to reply in this post.
You are not allowed to follow up in this post.

Sign in to reply to this post.