We do not want the contents of documents to be searchable. I am importing documents to a repository by drag and drop from my local machine to the repository using the windows client. I have my settings set to not extract text when saving documents from Microsoft Office, not to generate searchable text when importing documents, and when importing the documents I did not select the "generate searchable text" option. Everyone settings are set not to extract text as well. See screenshots below. When documents land in the repository the contents of the Word documents (only) are serachable. What could be causing this?
Question
Question
Issue With Searchable Text in Electronic Documents
Replies
The search engine does its own text extraction for Office documents.
I'm a little confused. Are you saying there is no way to prevent this? Why are there settings to ask if you want to extract the text?
The situation is a little confusing, it's not just you, and also it has evolved over time. The options you are pointing to are about extracting text or images and saving them as page data. Historically, this was the determiner for if the contents of an edoc were findable with a search or not. The drawbacks to this approach include: the text can get out of sync with the edoc; the existence of pages can confuse users if they don't otherwise have a use for them; you have to rely on users to generate the pages or else nobody can search for the document. So we've moved towards the search engine being able to build its search index without relying on users acting in a particular way. The result is that for better or worse, individual users have less control over the searchability of a document.
All that said, if you say more about your use case maybe we can come up with a solution.
Our departments use the Web Client. Although you can use the search filters to be sure you are searching only for the document names/fields, the quick search bar searches for anything searchable by default. The quick search is convenient and the prefered method our departments use. Also, infrequent users, regardless of how they are trained, tend to go right there because it is the obvious search area.
Here's an example... let's say I wish to search for "concrete" because I want to find any document related to a couple of projects with the word concrete as the project name (which would be used in the document name as well as in a document field), I will get any Word document that contains the text concrete. In a repository used to store thousands of procurement documents for a multitude of projects over the years, the word concrete would show up a whole lot because the details of various projects would involve the use of concrete, but concrete would not be part of the project name at all. I just tried this both ways in our development repository. The quick search returned 186 documents when I searched for concrete, when I used the search filters and only choose entry names and all fields, I get 55. Those 55 are documents related to the two actual concrete projects, i.e. what users would want, in this case.
This is our third repostiory, but the only one so far where there are many Word documents. In our other repositories, 5% of the documents may be a non-image, in this case, about half of the documents will be MS Office documents and need to remain as MS Office documents.
Is there a way we can somehow customize the search bar with "everyone attributes" (or any other way) to never return document content text? For this particular repository, we will never want to search document content text.
One other question I forgot to ask in my last reply... Wouldn't making the content of the documents text searchable impact search speed? Some of the Word documents in this repository will be 300-400 pages long.
I could not find another way to contact Laserfiche to ask about this aside from posting a comment on a previous post. I have been trying for over a week to post a new question on the Answers forum, each time I get the error shown here and it is regardless of which browser I use. Can you help, Miruna?
We've has an issue earlier where starting new threads was blocked, though it should be addressed now. Can log out and back in and you try again, please?
Logged out and back in, tried again - same as last week and earlier today.... here is the message:
We'll look into it.