You are viewing limited content. For full access, please sign in.

Question

Question

Retrieving Text from a Header in Word?

asked on July 16, 2014

 I'm attempting to retrieve text information from a header in a Word document.  Thus far, the only way I can get Quick Fields to recognize the header at all is to Generate Pages in the LF client first.  Is there any easier way to do this were I can automate this process or retrieve the header through different means?  I'd like to make things as easy as possible for my end user without them having to consistently generate pages from imported documents.  Thanks for your time!

0 0

Answer

SELECTED ANSWER
replied on July 18, 2014

Quick Fields relies on IFilters to extract text from Office documents. This Microsoft forum post seems to indicate that there are issues with the Word IFilter reading the header from some documents.

0 0

Replies

replied on July 16, 2014

Hi Luke,

 

Text Extraction should do the trick! wink

 

0 0
replied on July 17, 2014

Thanks for the reply, Chris!  Unfortunately, because the text is part of the header in Word, it is not seen at all by Quick Fields until I Generate Pages.  Any other ideas, sir?  Or am I missing something simple here?  Thanks again!  :)

0 0
replied on July 17, 2014

Ah! Hmmmm.... frown

 

Tough call. Well looks like you're stuck with generating the pages first. Only thing that might be possible would be to the use the SDK with a script that might be able to break the file up and get raw data somehow, Filestream or something? I'm not a code specialist so not sure what is or isn't possible here.

Maybe LF can help here or someone with code knowledge?

 

The other thing to remember here is that you can generate pages in Quick Fields so this doesn't have to be done in the client first? Do it all at the same time in Quick Fields will help ease the pain slightly? wink

0 0
replied on July 18, 2014

Thanks again for the tips, Chris.  Your comment about QF generating pages intrigued me.  Currently, I'm utilizing QF to OCR quite a bit of Word documents, but as I said before, it simply does not pick up the header at the top of document, unless I generate pages in LF first.  I know I could convert the docs to .PDFs or another format so QF would see them, but I was hoping to find a way to simply pull it from .DOC.  Hopefully, I'm making sense!  I'm open to any further suggestions, as I'm still fairly new to this whole process.  Thanks again!  

0 0
SELECTED ANSWER
replied on July 18, 2014

Quick Fields relies on IFilters to extract text from Office documents. This Microsoft forum post seems to indicate that there are issues with the Word IFilter reading the header from some documents.

0 0
replied on July 21, 2014

Thanks Miruna,

 

So to wrap this up if I'm understanding correctly, this is a Microsoft limitation. And the workaround would be like you're currently doing Luke to generate pages for the document.

0 0
replied on July 21, 2014

Well, darn.  I guess I'll try and look into a way to automate converting my .DOCs into .PDFs or something so I can streamline this for my end user.  Thanks for your assistance, gentleman!  

0 0
You are not allowed to follow up in this post.

Sign in to reply to this post.