You are viewing limited content. For full access, please sign in.

Question

Question

Gathering A Page number from a Text Search Context Hit

asked on April 6, 2018 Show version history

Hello,

I am wanting to check and see if anyone has any experience with my following dilemma.

In my workflow, I have a Search Repository activty that is set to search for all documents that contain a certain field value and also a certain OCR'd text value of "Teacher 2013-14 Annual Review".

After that activity, I am essentially trying to use the Retrieve Text activity to gather OCR'd text from document pages as they go through a For Each Entry Loop. Then I want to utilize the information that was gathered from the Retrieve Text activity to determine which page the "Teacher 2013-14 Annual Review" context hit was on.

After that, I want to utilize a token that contains the page number information in an assigned Tag.

 

I hava attached a screen shot of my workflow process so far.

 

Any help at all would be greatly appreciated.

 

Thanks!

 

Retrieve Text WF.jpg
0 0

Replies

replied on April 9, 2018

Hi Charles,

Something similar to the workflow below should work to find the page number.

Make sure when you 'retrieve document text' you select "For each page, create a separate value in the multi-value Text token.

Set the 'For Each Value' to %(RetrieveDocumentText_Text).   Having set that above as a multi-value token will allow us to iterate over each page.

Pattern match on the text you are looking for.     The conditional sequence below the pattern match checks to see if a successful pattern match was made.   If it was, the page number is the current iteration number of the 'For Each Value' loop.  %(ForEachValue_Iteration)     You can use that to assign your metadata page number.

 

~ Andrew

 

 

2 0
replied on April 11, 2018

Thanks Andrew!

 

I will give this a try.

0 0
replied on April 11, 2018

Andrew,

Can you show me what token value you created in the first Assign Token activity, what your Conditional Sequence window looks like and what you had in your Assign Token Values 2? I am still having some trouble getting my set up. I would greatly appreciate it.

 

Thanks!

 

0 0
replied on April 12, 2018

My assign token value , PageNum, is just to assign the page number to when I found it.   No other use.   The 2nd assign token activity could be replaced with a tag or metadata assignment.
I assigned the page number to it:

I did use a multi-value token just to check that I was not getting multiple matches.

To check when the pattern match was successful in the Conditional Sequence:

Retrieving the document text does require that the documents have been OCR'd and the text that you are pattern matching against was read accurately. 

This has worked well for finding the page number.   Hope that helped getting yours going.

 

 

0 0
replied on March 22, 2022

In case anyone needs to try this suggestion, it will only work if your text is on page 1. 

If your document has multiple pages, you need to increment the page number in your pattern match.  i.e. %(RetrieveDocumentText_Text#[%(ForEachValue_Iteration)]#)

 

 

0 0
You are not allowed to follow up in this post.

Sign in to reply to this post.