Retrieving Text from a Header in Word?

replied on July 16, 2014

Hi Luke,

Text Extraction should do the trick!

0 0

View 3 previous replies

replied on July 17, 2014

Thanks for the reply, Chris! Unfortunately, because the text is part of the header in Word, it is not seen at all by Quick Fields until I Generate Pages. Any other ideas, sir? Or am I missing something simple here? Thanks again! :)

0 0

replied on July 17, 2014

Ah! Hmmmm....

Tough call. Well looks like you're stuck with generating the pages first. Only thing that might be possible would be to the use the SDK with a script that might be able to break the file up and get raw data somehow, Filestream or something? I'm not a code specialist so not sure what is or isn't possible here.

Maybe LF can help here or someone with code knowledge?

The other thing to remember here is that you can generate pages in Quick Fields so this doesn't have to be done in the client first? Do it all at the same time in Quick Fields will help ease the pain slightly?

0 0

replied on July 18, 2014

Thanks again for the tips, Chris. Your comment about QF generating pages intrigued me. Currently, I'm utilizing QF to OCR quite a bit of Word documents, but as I said before, it simply does not pick up the header at the top of document, unless I generate pages in LF first. I know I could convert the docs to .PDFs or another format so QF would see them, but I was hoping to find a way to simply pull it from .DOC. Hopefully, I'm making sense! I'm open to any further suggestions, as I'm still fairly new to this whole process. Thanks again!

0 0

SELECTED ANSWER

replied on July 18, 2014

Quick Fields relies on IFilters to extract text from Office documents. This Microsoft forum post seems to indicate that there are issues with the Word IFilter reading the header from some documents.

0 0

replied on July 21, 2014

Thanks Miruna,

So to wrap this up if I'm understanding correctly, this is a Microsoft limitation. And the workaround would be like you're currently doing Luke to generate pages for the document.

0 0

replied on July 21, 2014

Well, darn. I guess I'll try and look into a way to automate converting my .DOCs into .PDFs or something so I can streamline this for my end user. Thanks for your assistance, gentleman!

0 0

Question

Question

Retrieving Text from a Header in Word?

Answer

Replies

Sign in to reply to this post.