The images are scanned via Laserfiche Scan and the images are TIFF.
Question
Question
Is there a way to search OCR text to exclud color images?
Answer
I am not sure what exactly you want help with.
Are you getting unexpected search results? If so, could you give specific examples? As Kenneth noted, if the OCR failed for those pages, then there will be no text and those pages will not be included in text searches; there are no steps you should have to take to exclude them.
Are you looking to exclude color images from a search that you are performing before you OCR? If so, you could try adding a page size limit to your search because color page will be larger.
Are you looking to troubleshoot why you cannot OCR those pages? If so, please either upload them (if they do not contain sensitive information) or open a support case and attach the images to the support case.
Replies
What do you have to work with? You could have Workflow analyze your repository for color documents and tag them, or add a field that indicates color. Then you could use that in your searches. I actually have a workflow that does just that. Let me know if you want to see it, and I can scrub it for security information and post it.
Can you explain what you mean? is it just that color images are giving you bad results when OCR'd and you want to set users to have better default settings so this is minimized?
The color images are pictures that are TIFF, there are a quite a few of these images and I'm trying to excluded them on the search results because if you OCR these color pictures it will give you "Error reading file.
If they have an error, then they do not produce OCR information. That should only affect that page though, not the entire document unless that entire document is only 1 page.
They would automatically be excluded since no information was generated.
So you are saying on the same page you have a color image and text but you just want to get the text OCR'd? When is the OCR happening?
You say you are using LF Scanning, so you can use the color removal tool and then do the OCR process if that would help, though it does not sound like it would.
some are 1 page, some are not. I cannot remove the color because it is a requirement for them to be colored.
Well, it would really be a great use of Quick Fields to generate the searchable text. But you havent confirmed you are wanting to capture text that is on the same page as an image, so I do not know if the advanced functionality of QF and maybe use with Quick Fields Agent would be worth the price if you do not already have it.
The OCR engine should be skipping the pages it has trouble with and continuing to the next. If it is having trouble with multiple pages and not going past those, then you may need to check other OCR settings in the LF Client, maybe you can find a set of settings that will result better for you.