You are viewing limited content. For full access, please sign in.

Question

Question

Generate Pages without OCR'd text on a PDF

asked on October 5, 2022

Is there a way to generate TIFF pages on a PDF document *without* having it also OCR? The box to index (make text searchable) is greyed out whenever we generate pages, and deleting the OCR'd text after the fact is a non-solution.

indexing.JPG
indexing.JPG (27.68 KB)
0 0

Replies

replied on October 6, 2022

The options for PDF text extraction can be found under Tools\Options\Generate text\Advanced PDF options. You can set it to "use native extraction method" which will only generate text pages if the PDF has a text layer.

There is no way to completely turn off text page generation if you're generating images.

"Index (make text searchable)" does not control page generation.

Out of curiosity, what is the reason you need images but not text?

0 0
replied on October 6, 2022 Show version history

The PDF's have information that is redacted on the PDF layer, but the text layer when OCR'd allows users to highlight and copy/paste the redacted text. So they want to iron on the PDF redactions into LF pages to prevent people from having that data.

 

Is there an easy way to delete OCR'd text for an entire document or set of documents?

0 0
replied on October 6, 2022

I see. Then your best bet might be to use Workflow and Distributed Computing Cluster for page generation from these PDFs. Schedule PDF Page Generation activity does give you the option to not extract text at all.

1 0
You are not allowed to follow up in this post.

Sign in to reply to this post.