You are viewing limited content. For full access, please sign in.

Question

Question

Adding a PDF from Laserfiche Form doesnt have texte

asked on January 25, 2018

Hi all,

I have 1 PDF added from 2 differents starting point ("test from LLForm.pdf" and "test from Windows.pdf")

 

 

When I'm adding a pdf from Windows to Laserfiche, my file have "Text".

 

But the same PDF added from Laserfiche Form to the same Laserfiche's folder doesnt have it.

 

I don't understand why. I need this text and I need it from LFForm. How can I do?

 

Thanks in advance.

Regards

0 0

Replies

replied on January 25, 2018 Show version history

This is likely because you have your client set to extract text from PDF's on import.

 

Bringing a PDF into the repository using any other method won't run through this step.

A couple of ways to make it work are to either have a Quick Fields agent process extract the text, or to use a PDF library called from a Workflow script.

A more squirrely way to do it would be to export the file to a file system folder, and have Import Agent bring it back in, extracting the text in the process.

1 0
replied on January 25, 2018

Hi Devin,

 

Thank you for your return.

 

My client dont have "Import Agent" so I can't use it.

About Quickfield, is it means we need a human's action?

Actually, it looks like the workflow is the best way. Do you have a workflow's template?

 

Regards

0 0
replied on January 29, 2018

You can maybe do a workflow activity usinf SDK script to do it

https://answers.laserfiche.com/questions/48724/Pdf-to-tiff-via-sdk-and-tiff-to-pdf-its-possible-

0 0
replied on January 29, 2018

Hi Rene,

 

Yeah I already read this article and tried but unsucess.

I have this error.

 

It said "Invalid statement in a namespace"

This is my code

01Imports System
02Imports System.Collections.Generic
03Imports System.ComponentModel
04Imports System.Data
05Imports System.Data.SqlClient
06Imports System.Text
07Imports System.Runtime.InteropServices
08 
09Namespace WorkflowActivity.Scripting.Script
10    '''<summary>
11    '''Offre une ou plusieurs méthodes qui peuvent être exécutées au moment de l'exécution de l'activité de scriptage du flux de travail.
12 
13 
14Public Class Ghostscript
15 
16    <StructLayout(LayoutKind.Sequential)> _
17    Public Structure GSVersion
18        Public product As String
19        Public copyright As String
20        Public revision As Integer
21        Public revisionDate As Integer
22    End Structure
23 
24    <DllImport("gsdll32.dll", CharSet:=CharSet.Ansi, CallingConvention:=CallingConvention.StdCall)> _
25    Private Shared Function gsapi_revision(ByRef version As GSVersion, ByVal len As Integer) As Integer
26    End Function
27 
28    <DllImport("gsdll32.dll", CharSet:=CharSet.Ansi, CallingConvention:=CallingConvention.StdCall)> _
29    Private Shared Function gsapi_new_instance(ByRef pinstance As System.IntPtr, ByVal handle As System.IntPtr) As Integer
30    End Function
31 
32    <DllImport("gsdll32.dll", CharSet:=CharSet.Ansi, CallingConvention:=CallingConvention.StdCall)> _
33    Private Shared Function gsapi_init_with_args(ByVal pInstance As IntPtr, ByVal argc As Integer, <[In](), Out()> ByVal argv As String()) As Integer
34    End Function
35 
36    <DllImport("gsdll32.dll", CharSet:=CharSet.Ansi, CallingConvention:=CallingConvention.StdCall)> _
37    Private Shared Function gsapi_exit(ByVal instance As IntPtr) As Integer
38    End Function
39 
40    <DllImport("gsdll32.dll", CharSet:=CharSet.Ansi, CallingConvention:=CallingConvention.StdCall)> _
41    Private Shared Sub gsapi_delete_instance(ByVal pinstance As System.IntPtr)
42    End Sub
43 
44    Public Shared Sub getVersion(ByRef version As GSVersion)
45        gsapi_revision(version, Marshal.SizeOf(version))
46    End Sub
47 
48    Public Shared Sub run(ByVal argv As String())
49        Dim inst As IntPtr = IntPtr.Zero
50        Dim code As Integer = gsapi_new_instance(inst, IntPtr.Zero)
51        If code <> 0 Then
52            Return
53        End If
54        code = gsapi_init_with_args(inst, argv.Length, argv)
55        gsapi_exit(inst)
56        gsapi_delete_instance(inst)
57    End Sub
58 
59End Class
60 
61Private Sub ToTIFFG4(ByVal sPDFPath As String, ByVal sOutputFolder As String)
62    If Not String.IsNullOrEmpty(sPDFPath) Then
63        If Not String.IsNullOrEmpty(sOutputFolder) Then
64            If IO.File.Exists(sPDFPath) Then
65                Try
66                    If Not IO.Directory.Exists(sOutputFolder) Then
67                        IO.Directory.CreateDirectory(sOutputFolder)
68                    End If
69                    Dim fi As IO.FileInfo = New IO.FileInfo(sPDFPath)
70                    Dim sOutName As String = IO.Path.Combine(sOutputFolder, fi.Name.Replace(fi.Extension, "_G4.tiff"))
71                    If IO.File.Exists(sOutName) Then
72                        IO.File.Delete(sOutName)
73                    End If
74                    Dim gsVer As New Ghostscript.GSVersion()
75                    Ghostscript.getVersion(gsVer)
76                    If gsVer.revision > 900 Then
77                        Dim argv As String() = {"PDF2TIFF", "-q", "-sOutputFile=" & sOutName, "-dNOPAUSE", "-dBATCH", "-P-", _
78                     "-dSAFER", "-sDEVICE=tiffg4", "-r300", sPDFPath}
79                        Ghostscript.run(argv)
80                    End If
81                Catch ex As Exception
82                    MsgBox(ex.Message)
83                End Try
84            End If
85        End If
86    End If
87End Sub
88 
89End Namespace

 

0 0
replied on January 29, 2018 Show version history

Keep in mind that this method appears to be attempting to create images from a PDF, not extract text.

One question that I should ask is what are you wanting to do with the text that you extract? Be aware that depending on how the PDF was written, it may have a wonky internal structure and it may be tricky to get text that makes sense.

 

0 0
replied on January 29, 2018 Show version history

Hi Devin, thank you for your help.

 

This is a little bit difficult

 

My customer use the software PIXI. From this, he fill a form for their customers. From this form, you have a lot of informations (id, lastname, firstname, dob, ...).

From the software, the agent print a PDF's file (i'm going to call it PDF1) and a paper for get signature.

The PDF1's name is now().pdf (exemple : 20180129094012.pdf).

When their customers sign the form, the agent scans the form signed and the attachments (passport, invoice, ...) and get a new pdf (I'm going to call it PDF2).

The PDF2's name is now().pdf (exemple : 20180129094205.pdf).

 

Under Laserfiche, I need to archive files like that :

REPOSITORY \ <Customer> \ <Form_ID> \ PDF1

REPOSITORY \ <Customer> \ <Form_ID> \ PDF2

 

The form and the attachments need to be in the same folder and I need to get informations from the form.

 

Actually my difficulties are :

#1. The agent don't rename the PDF (1 and 2),

#2. Even if the agent rename the PDF (1 and 2), he do it wrong (Exemple : 1234.pdf instead of 1235.pdf or PDF1 get 1.pdf and PDF2 get 2.pdf).

#3. The first page of the scan is never the same (sometimes is : "form, passport, invoice", and sometimes is : "passport, form, invoice".

 

My solution was to retrieve informations from PDF1 using a simply drag and drop + Workflow + pattern matching.

PDF2 is a little bit difficult. I don't know how to merge PDF2 to PDF1 without :

import agent, QF Import, drag and drop to laserfiche, and rename it.

 

My solution was to use Web Form with import's fields. Using workflow we could get informations from attachment 1 (PDF1) and simply merge attachment 2 to attachment 1.

 

But from web form, the attachment don't get text, so I can't use pattern matching on the attachments.

 

I don't know if this is very clear for you.

Don't hesitate to ask me more informations if you need.

 

Regards

 

 

 

 

0 0
replied on January 29, 2018 Show version history

Allowing the user to drag and drop a document is going to be the most consistent way to extract the text. You can configure the client to always show the metadata dialog when they drag a document into a folder. This would give them an opportunity to add metadata to the document. You could use this to find related documents and merge them.

If you are worried about data entry errors, that's where Workflow can help you out.

Here's one possible solution:

  1. User drops the first document onto the client
  2. Client displays the metadata dialog
  3. User enters metadata
  4. Workflow picks up the document, and if the user entered required fields, Workflow can validate their input against the PIXI database. If it can't find a matching record, it can route the document to a "needs correction" folder and send the user an email detailing what went wrong.
  5. The user drops the second document onto the client.
  6. Repeat step 4.
  7. Once both documents have been vetted, you can use the metadata to find both documents and merge them together.

 

Does all of that make sense?

 

0 0
replied on January 30, 2018

Hi Devin,

 

#1 : OK

#2 : OK

#3 : My customer won't to populate metadata. Anyway, this is not a problem ; using workflow I can retrieve data from the PDF.

#4 : The document will still be ok, we don't need this step.

#5 : My customer won't to populate metadata => second document is not identifiable.

Due to #5 We can't have #6 and #7

 

0 0
replied on February 1, 2018

Well, I can't help you if the users won't do their part. :)

If you don't mind slowing down the scanning process, you can have the Client OCR documents as they're scanned. That might help you. However, when you're doing pixel-based OCR as opposed to extracting text from a PDF, I recommend you still validate the data against a database. You never know if the OCR was completely accurate.

1 0
replied on February 1, 2018

Thank you Devin. We are going to get Import Agent. In my mind, this is the best solution.

0 0
You are not allowed to follow up in this post.

Sign in to reply to this post.