Why are pages generated for PDFs when I generate text?

replied on June 26, 2018 • Show version history

Take a look at your Advanced PDF Import Options:

In the Laserfiche Client, click Tools > Options >Generate Text > Advanced Settings for PDFs

Make sure that it's set as follows:

Keep in mind that depending on how the PDF is composed, you may need different settings.

0 0

View 12 previous replies

replied on June 26, 2018

But what if it doesn't have a text stream? These are often .pdfs that are created through scanning.

0 0

replied on June 26, 2018 • Show version history

You can either manually delete the pages, or you could set up a workflow to do it automatically. You could do a search in workflow for all of the appropriate documents and then walk through the list deleting pages.

Is there a specific reason for keeping the PDFs instead of using native Laserfiche pages? Are you using Laserfiche to scan, or some other product?

0 0

replied on June 26, 2018

Laserfiche pages are much larger than compressed pdfs, and there's no point in having two copies.

Right now our scanners are set to scan to our emails, from which we drag/drop or use the ribbon to put into LF. We have import agent, which we used with our previously scanners (that scanned to a folder) to import. We may do this again, but will wait to see if we can use quick fields instead.

I suppose we can just delete the pages! That's a simple solution: thanks!

0 0

replied on June 26, 2018

That's interesting. We're usually able to get our scanned documents down to the same or smaller than a PDF.

Anyway, I'm always curious about how folks are doing things. Thanks!

0 0

replied on June 26, 2018

So you scan as tifs, at what dpi?

0 0

replied on June 26, 2018

We scan black and white TIFFs at 300 DPI. I don't know how your locality is, but we are required to scan at 300 DPI and use lossless compression, which rules out JPEG or PNG. So, TIFF is the only way to guarantee compliance. In the case of scanned documents, PDF may or may not be compliant.

0 0

replied on June 27, 2018

Ah, in Canada we don't have set requirements like that. If you scan in black and white, that's probably why.

0 0

replied on June 27, 2018

You can scan in color, but with lossless compression it's at least a meg per page. For almost everything, color is unnecessary. Also, having native Laserfiche pages gives a whole lot more flexibility.

0 0

replied on June 27, 2018

In general, the system works better with Laserfiche pages rather than native PDFs. It will of course support both, but some functionality such as the ability to place annotations directly on the page, or the ability to open up to a specific highlighted locations in the document from search result context hits, are only available on Laserfiche pages. Also, for those cases as you note where there's no text stream on the PDF, the system needs to first generate the image pages in order to OCR them to generate the text anyway. Lastly, TIFFs can be retrieved one page at a time rather than as a whole document, which can increase load times (assuming they aren't large color files of course).

Devin's suggestions about scanning are pretty important - color vs B/W make a big difference when it comes to TIFF page size.

1 0

replied on June 27, 2018

Unfortunately, colour is pretty important in our organization as our minutes, agendas, reports, etc. all use colour to indicate information about a document. As well, although Canadian law/standards doesn't refer to specific DPI, it does refer to maintaining the integrity and image of the original, which would include colour.

All great points though, just doesn't work for us.

0 0

replied on June 27, 2018 • Show version history

Right. it's not an option for everyone - that's why we don't enforce it within the system and why we do try to offer as native support as we can when it comes to PDFs as well. There are those specific architectural differences still though.

Although I have a question back to you - we've implemented the WCAG accessibility standards for our client applications initially based on Canadian regulations, and I believe that one of the items is that color can't be required for the sole method of identifying critical information. Now, that standard is specifically for software applications and not for the information itself, but that made me curious about your mention of color here for identifying information. Is there any sort of regulation regarding that use on the content itself? Mostly just trying to make sure I understand the regulations fully. Thanks!

Reference: https://www.w3.org/TR/UNDERSTANDING-WCAG20/visual-audio-contrast-without-color.html.

0 0

replied on June 27, 2018

In particular, I'm referring to the CAN/CGSB-72.34-2017E Electronic Records as Documentary Evidence standard. This is a snippet:

It's not explicit what "loss of information" means, however, given our circumstances (e.g., colour representing context around the text in our documentation), I believe that not having the colour represented in the documentation would be a loss of information.

I hope that helps!

0 0

replied on June 27, 2018 • Show version history

We do have non-coloured text to indicate meaning (e.g., a "confidential" watermark) besides our coloured symbols (e.g., a red frame around confidential reports); however, these documents are born digital, it doesn't make sense to change them from pdf to tif (and b&w). Though in that case, they would have a text stream...

0 0

replied on June 27, 2018 • Show version history

If the documents start out digital, then it sounds like the application that converts/exports it to PDF is stripping out the text layer. I would check with that products support to see if they can include the text layer in the PDF they create.

0 0

replied on June 27, 2018

Hi @████████, in this case, it wouldn't apply to born-digital .pdfs. It can pull the text stream fine.

0 0

Question

Question

Why are pages generated for PDFs when I generate text?

Replies

Sign in to reply to this post.