Thousands of pages generated when saving MS Office docs

asked on December 4, 2018

Hi All,

This has been a known issue for a long time, whereby when you have "Automatically extract text when saving document from Microsoft Office"

selected, the saved documents can sometimes have thousands of pages generated, which looks to the user like the import has hung. The workaround is to un-check this when saving office documents (not really a long term solution).

When raising support cases for this in the past, it was noted this was a known issue. I'm assuming the known issue is

https://support.laserfiche.com/kb/1013948/release-notes-for-laserfiche-10-3-1

If a Microsoft Excel spreadsheet is imported to Laserfiche and has the same name as an existing document in the same folder, and you choose to save the document as a new version of the existing document, text generation may cause an exception in Microsoft Excel. (43261)

What I'm looking for is any kind of general update as to when/if this will ever be resolved? I'm assuming it's an issue with Microsoft, and how OCR is handled, but we are seeing this more and more and simply stating 'it's a known issue' isn't really a solution for the customer.

Cheers!

0 0

replied on February 21, 2019

Hi There,

Has this been addressed? I am experiencing the same.

Thank you

0 0

View 11 previous replies

replied on February 22, 2019

Nope....

0 0

replied on February 22, 2019

Hi Chris,

There shouldn't be any OCR process going on here, it's just doing a direct text extraction operation. Can you speak more of why you need to extract text here? Full text search should be able to update the search index of the document even if you don't update text, provided that text is not already present.

0 0

replied on February 25, 2019

Hi Justin,

Please look at case 196369 this might give you some more info

0 0

replied on February 25, 2019

Ok, so basically the issue is the context hits?

0 0

replied on February 26, 2019

Hmmmm, not quite. So basically, sometimes when you import an Excel document with OCR switched on which it is by default, the users sees the import "taking a long time", when in fact what's actually happening in the background is Laserfiche is generating thousands of pages for a single page Excel document. I haven't managed to pinpoint the exact cause of these, but the solution from Laserfiche is to turn off the option above, which obviously isn't ideal as people want to content search documents saved from MS Office.

Interestingly, it's only Excel that this issue affects, Emails & Word documents don't see affected.

P.S. I did see this issue in the past affect emails, but this was back in 8.2.x but haven't seen it since then.

0 0

replied on February 26, 2019

Oh, I saw that that was the bug - I was wondering why you needed to generate text in the first place, since the documents will be full-text searchable without it. I was curious if it was so there will be context hits.

0 0

replied on February 26, 2019

Yes, the full text search is useful at returning the documents, but the context hits gives that extra layer for the user (when it works). For now we just tell them to turn it off, as it's on by default (might be useful to change this to off by default).

Do we know what's going on with this, and when a fix might be available? We see this issue pop up often on the front line.

0 0

replied on March 5, 2019

I agree Chris this should be turned off by default. But as you state it is not ideal because you would like to see the full context hit capability. This issue seems to be persistent.

0 0

replied on March 6, 2019

The problem I see Chris is having it turned off is that no office documents are OCR'd and given we can not run an overnight OCR routine, we are just building up the number of word and excel documents that someone has to manually OCR and for which they may not have access to.

We can not ask the users to once they have finished for the day, find all docs they have worked on and OCR them

0 0

replied on March 6, 2019

Hi Paul,

I think there may be crossed wires here, so even with the setting turned off, the documents are still text searchable, it just doesn't include the context hits (so where LF puts the word in the sentence).

If you text search for these documents they will still be returned.

This of course doesn't excuse the fact that this needs to be resolved, but at least there is some functionality.

Cheers!

0 0

replied on September 12, 2019

Just wanted to add my voice to this issue, which we're also experiencing. Hoping for a resolution but will turn off the setting for now.

0 0

replied on May 1, 2020 • Show version history

We're experiencing this issue as well. With smaller spreadsheets ( <100 KB )it can cause the import to take 5+ minutes to complete, but with large documents ( >1 MB ) it effectively hangs the application - I've waited over 20 minutes and it never completed the save.

Does anyone know if this is resolved in newer releases?

0 0

replied on May 1, 2020

Hi Andrew,

No. Nothing fixed yet...

0 0

SELECTED ANSWER

replied on May 4, 2020

Hi all, this issue was addressed in the 10.4.2 release. In testing, import time on large and small spreadsheets dropped from minutes to seconds. On large files, number of pages dropped from multiple thousands to an expected amount of around ~100. The release note listing in 10.4.2 for this as an on-going known issue was an error and will be resolved.

2 0

Question

Question

Thousands of pages generated when saving MS Office docs

Answer

Replies

Sign in to reply to this post.