You are viewing limited content. For full access, please sign in.

Question

Question

Possible Import Agent Bug: Page 1 ImageHeight/ImageWidth are 0

asked on March 3, 2022 Show version history

Hello,

I have an SDK Script activity in workflow designed to catch and redirect corrupt images before documents are sent to the DCC for OCR.

My script checks for a missing page image or a height/width of 0.

foreach(PageInfo page in doc.GetPageInfos()){
    // Validate the contents of each page image
    if(!page.HasImage || page.ImageHeight <= 0 || page.ImageWidth <= 0){
        corruptImages = true;
    }
}

We recently upgraded to LF 11 and upgraded Import Agent to the latest version and I found that every imported document I check is failing.

Upon closer inspection, it seems that the documents coming in from Import Agent have a ImageHeight and ImageWidth property of 0 for the first page despite having a valid image file.

Whenever I open an affected document in the client the issue is resolved and the document no longer has 0 values for height/width.

Additionally, I found that just using page.ReadPagePart(PagePart.Image) is enough to correct the missing values.

As a result, I think this may be a bug with Import Agent because documents from other sources don't appear to be affected.

Import Agent version 10.4.0193

 

UPDATE: After extensive testing I discovered that this seems to happen with every file that is brought in via XML in Import Agent.

The same PDF generates pages without issues if brought in directly through IA or the client, but with XML imports in IA I could reproduce the issue every time.

0 0

Answer

APPROVED ANSWER
replied on March 3, 2022

Exactly. It only seems to happen when generating pages on a document imported with an XML list file, and always the first page.

I doubt most people would even realize it is happening because when you open an affected document in the client it gets updated/corrected.

The only reason we had trouble is because I have a workflow script that validates the ImageHeight and ImageWidth attributes right after import.

0 0

Replies

replied on March 3, 2022

What about the original files before Import Agent touches them? Do they have a width and height?

0 0
replied on March 3, 2022

Sorry, I should have specified that part.

The original files are PDF and we're generating pages with minimal settings, but the files and the underlying images that are generated have no issues; it only seems to be the ImageHeight and ImageWidth properties for the first page.

The issue didn't start until after the updates and nothing has changed with the original files or the source process. I suspected Import Agent because we have pages being generated by things like Forms uploads and they haven't been affected.

To test, I ran some scripts that would read the page part image and found that even when the ImageHeight/ImageWidth properties were 0, the underlying image did in fact have a valid height and width.

After running some more tests I found that previously failed documents would suddenly be valid, so I kept removing testing steps to figure out what was "correcting" the problem.

Once I narrowed it down, I added an SDK Script that just reads PagePart.Image because after that is run and the script ends I no longer get 0 values.

For example, if I read the page part and immediately check the ImageHeight and ImageWidth, they will still show as 0 even if I draw a bitmap from the read stream and see non-zero values, but if I re-run the script it would be fine on the second attempt.

My best guess is that something is causing an issue with the first page's attributes on import/creation, but opening it in the client or reading the page part image is enough for it to self-correct.

Running the following as a separate activity before the image validation script somehow fixed the problem so I have that as a workaround step for now.

DocumentInfo doc = (DocumentInfo)this.BoundEntryInfo;

foreach(PageInfo page in doc.GetPageInfos()){
    if (page.HasImage) {
        if (page.ImageHeight <= 0 || page.ImageWidth <= 0) {
            using(LaserficheReadStream rs = page.ReadPagePart(PagePart.Image)){}
        }
    }
}

 

1 0
replied on March 3, 2022

That's very different, thanks for the details. Import Agent has been updated to bring it up to date on PDF page generation fixes that we made for LF11. Does it happen for all PDFs? Does it happen if you generate pages through web/Windows client? Can you send some samples our way through your support people?

1 0
replied on March 3, 2022 Show version history

I was having a hard time identifying a sample because every document I copied seemed to work just fine when tested with a separate import profile.

However, I kept testing and discovered it is only affecting PDFs imported using XML, which is why the same files seemed to work in other attempts.

Attached is a sample PDF I created, and below is the XML I used to finally reproduce the issue, but with our server name edited out of the import path.

<?xml version='1.0' encoding='utf-8'?>
	<LF:importengine xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance' xsi:schemaLocation='http://laserfiche.com/namespaces/importengine ImportEngine.xsd' xmlns:LF='http://laserfiche.com/namespaces/importengine' version='1.0'>
		<LF:toc on_document_conflict='unique' on_folder_conflict='unique'>
			<LF:document name='Test Document'>
				<LF:electronic_document content_type='application/pdf' extension='pdf' extract_text='false'>
					<LF:fileref ref='\\<servername>\D$\IA Test\TEST DOCUMENT.pdf' />
				</LF:electronic_document>
			</LF:document>
		</LF:toc>
	</LF:importengine>

I haven't tested with LST files, so it's possible they could be affected too.

0 0
replied on March 3, 2022

Don't you just hate it when that happens? wink

So you're saying that this PDF will work fine if you generate pages while importing through the web/windows client or IA as a separate doc, but if you generate pages when importing it with an XML list file, the image is dimensions are not set right?

0 0
APPROVED ANSWER
replied on March 3, 2022

Exactly. It only seems to happen when generating pages on a document imported with an XML list file, and always the first page.

I doubt most people would even realize it is happening because when you open an affected document in the client it gets updated/corrected.

The only reason we had trouble is because I have a workflow script that validates the ImageHeight and ImageWidth attributes right after import.

0 0
replied on March 4, 2022 Show version history

Hi Jason, the issue for xml import has been filed as bug 368550. I also verify that only the first page height and width is not 0 on lateste server, and it works for lst import.

0 0
You are not allowed to follow up in this post.

Sign in to reply to this post.