One frustration I have is having to generate pages to OCR scanned pdf content, which can be costly on storage, create confusion for staff with multiple file types, the pdf itself not being searchable if exported, etc.
I'm wondering, has anyone implemented a process to OCR pdfs prior to ingestion? Do you use third party software like contentCrawler, and have you found it to be worth the cost?
Thanks.