I'm attempting to retrieve text information from a header in a Word document. Thus far, the only way I can get Quick Fields to recognize the header at all is to Generate Pages in the LF client first. Is there any easier way to do this were I can automate this process or retrieve the header through different means? I'd like to make things as easy as possible for my end user without them having to consistently generate pages from imported documents. Thanks for your time!
Question
Question
Answer
Quick Fields relies on IFilters to extract text from Office documents. This Microsoft forum post seems to indicate that there are issues with the Word IFilter reading the header from some documents.
Replies
Hi Luke,
Text Extraction should do the trick!
Thanks for the reply, Chris! Unfortunately, because the text is part of the header in Word, it is not seen at all by Quick Fields until I Generate Pages. Any other ideas, sir? Or am I missing something simple here? Thanks again! :)
Ah! Hmmmm....
Tough call. Well looks like you're stuck with generating the pages first. Only thing that might be possible would be to the use the SDK with a script that might be able to break the file up and get raw data somehow, Filestream or something? I'm not a code specialist so not sure what is or isn't possible here.
Maybe LF can help here or someone with code knowledge?
The other thing to remember here is that you can generate pages in Quick Fields so this doesn't have to be done in the client first? Do it all at the same time in Quick Fields will help ease the pain slightly?
Thanks again for the tips, Chris. Your comment about QF generating pages intrigued me. Currently, I'm utilizing QF to OCR quite a bit of Word documents, but as I said before, it simply does not pick up the header at the top of document, unless I generate pages in LF first. I know I could convert the docs to .PDFs or another format so QF would see them, but I was hoping to find a way to simply pull it from .DOC. Hopefully, I'm making sense! I'm open to any further suggestions, as I'm still fairly new to this whole process. Thanks again!
Quick Fields relies on IFilters to extract text from Office documents. This Microsoft forum post seems to indicate that there are issues with the Word IFilter reading the header from some documents.
Thanks Miruna,
So to wrap this up if I'm understanding correctly, this is a Microsoft limitation. And the workaround would be like you're currently doing Luke to generate pages for the document.
Well, darn. I guess I'll try and look into a way to automate converting my .DOCs into .PDFs or something so I can streamline this for my end user. Thanks for your assistance, gentleman!