You are viewing limited content. For full access, please sign in.

Question

Question

What is the best (fastest) way to import a HUGE PDF?

asked on September 21, 2016

Hello everyone!

Working on a new process here.  As of today, this process involves the production of a large, huge, gi-normous, big PDF.  150,000+ pages big.

And, we need to import this PDF and index it so that the text inside can be searched.

Just starting to test.  Importing with the client has been going for almost 48 hours and we're at like 60,000 pages imported and indexed so far.

We have nearly the entire Laserfiche kit of tools at our disposal - what methods would you recommend?

Some of my possibilities:

  • Import with the client  er, no
  • Use import agent or Quickfields to import from a folder, then index.
  • Import with Workflow "For Each File."  Index. (not sure you can do this, maybe with scripting?)
  • Possibly use the Distributed computing cluster?

 

So, experts far and wide, what say you?  Any thoughts to my predicament?

Thanks in advance!

 

0 0

Replies

replied on September 21, 2016

The various options (client/QF/IA) won't make much of a difference because it is a CPU bound operation (lots of image rendering) and none of the products do the page generation in parallel. Version 10  introduced some pretty big performance enhancements to pdf page generation, but you would need to update your LF server to use the 10 client. If possible, do the page generation on the fastest cpu you have.

If the PDF has embedded text (not just images that need to be OCR'd), you could just import it without generating pages or text and the pdf itself should still be indexed.

3 0
replied on September 21, 2016

That's the second-largest PDF I've heard of today! Not really. That's a huge file.

You could try using Import Agent 10 to take advantage of multithreading on the machine and that might help. You could also focus on importing first and then generating text and indexing at a later time. With a file that large, though...it might be worth assessing whether a PDF with that many pages is necessary or if it could be broken into smaller files. From a usability standpoint, I would find having a 150,000+ page PDF rather daunting to face as a reader.

Good luck with the process, though!

0 0
replied on October 4, 2018

Hi Michael, how did this go?

Hopefully it's finished by now but barcode or OCR page-separation wasn't required, I would have used a open-source PDF\ product to split the document. That way you'd have several restore points in case of issues.

I came to this post whilst looking for QF tuning and threading information, in case you're wondering.

0 0
You are not allowed to follow up in this post.

Sign in to reply to this post.