You are viewing limited content. For full access, please sign in.

Question

Question

Batch convert PDF to TIFF while retaining metadata

asked on February 9, 2021

I have uploaded PDF files in a repository, and I'm building batch processing around these. These PDF files have metadata attached. Is there a way to convert these automatically as they appear, without user intervention. I need to retain the metadata on these documents while doing so.

Tools available to me include

  • On-premises Laserfiche installation including
  • Quick Fields
  • Quick Fields Agent
  • Import Agent

If this is one of the capabilities that is supported in the cloud but not on-premise, I would like to know that as well.

Thank you!

0 0

Answer

SELECTED ANSWER
replied on February 9, 2021

It's complicated.

Since you have Import Agent, I'd export the pdf to a folder on the file system, and point IA to that folder. Give the exported document a name with the DocID of the original document. Set IA to generate pages and remove PDF pages, and to bring it back into a folder you are watching. Loop your workflow around for a few minutes waiting for the file to appear. When it does you can find the original document using the docid in the file name of Doc #2, and then use the nifty Copy Metadata task to preserve your fields.  I'd love to know if there's an easier way, but that's the best solution I could come up with for a similar situation.

0 0
replied on February 9, 2021 Show version history

Wow, that was creative! Slightly tortured, but it would work, which is more than I can say for the things I've tried so far.

Thanks for the input!

0 0
replied on February 9, 2021 Show version history

You are welcome - and I do wish there was a better way. In this setting we did not have QF and QF Agent, so Miruna's approach is definitely worth a try.  It looks like QF has the same PDF conversion options, (Scan, Configure Scan Source). For the page replacement options, go to Document Class Options, Document Merging.

That should preserve the metadata, and the PDF will also have tif pages associated with the pdf pages.

 

Late edit: You should be able to also retrieve the fields from the original document via the Capture Engine, so just create a new document (which will have the original metadata) and preserve a link via the doc ID to Doc 1 so you can delete it once you save Doc 2.

 

Don't actually need this then:

Now, if you really need to get rid of the pdf, I'd add the entry ID (Doc 1) to the  document name of Doc 2, and create a new document.  On doc creation, parse out the DocID, search for the original, and copy the metadata instead. In our setting we moved Doc 1 to a kind of holding folder, so we could drive the WF that puts all of this together off of the new (Doc 2) creation event.

Then you can delete Doc 1.

 

 

1 0
replied on February 9, 2021

And here's some code that will export the pdf:

            Dim LF_DocExporter As New DocumentExporter()
            Dim sExportPath as String = GetTokenValue ("ExportPath")
            Dim LF_DocInfo as DocumentInfo = Me.BoundEntryInfo
            Dim sExt as String = GetTokenValue("MimeType")
            'Entry ID allows us to ID the file and tie the indexes back together.
            'Dim sDocName as String = LF_DocInfo.Name & "_" & LF_DocInfo.Id
            'Or use this version to preserve the DCN
            Dim sDocName as String = GetTokenValue("Getthedocument_DCN")
            'Remove any slashes so they don't act as escape characters.
            sDocName = Replace(sDocName, "/", "-")

           'Set up the export. Document name, extension
           LF_DocExporter.ExportElecDoc (LF_DocInfo, sExportPath & sDocName & "." & sExt)

           'Capture the file name for a return to the calling rule
            SetTokenValue("ExportFileName", sDocName & "." & sExt)

 

1 0

Replies

replied on February 9, 2021

Just set Quick Fields to get them using Capture Engine and use the %(path) token in the document class configuration to send them back where they came from. You can set Quick Fields to replace pages in existing documents and not touch metadata.

1 0
replied on February 9, 2021

I experimented with Quick Fields, but it didn't retain the metadata. Is there a way to make it do so?

0 0
replied on February 9, 2021

In the document class options, there's a section for merging documents if one with a duplicate name is found in LF when Quick Fields sends its copy:

You can specify what you want to send as well as who wins when merging metadata:

1 0
You are not allowed to follow up in this post.

Sign in to reply to this post.