You are viewing limited content. For full access, please sign in.

Question

Question

Looking To Combine Multi-Value Token Based on Condition or Edit Document Text In Workflow

asked on December 18, 2015

I have a workflow where I am retrieving document text and running a pattern match so that if certain criteria is met the text is placed into a metadata field.  90% of the time this is successful and I didn't know if anyone had any recommendations for what to do the other 10% of the time.  Here is an example of the text that  I am extracting

 

 Commodity Real Return Fund         USD           11,037.528        5.92           65,342.17   1.000000       65,342.17
     Inst Acc
      Credit Absolute Return Fund          USD           13,262.599           11.07          146,816.97   1.000000      146,816.97
     Inst Acc
      Diversified Income Fund          USD             8,424.600           19.36          163,100.26   1.000000      163,100.26
     Inst Acc
      Emerging Markets Bond Fund         USD             2,583.979           38.31           98,992.24   1.000000       98,992.24
  

The pattern that I am using is .*USD.................    

The field that I get which is what I want is

Commodity Real Return Fund         USD           11,037.528

Credit Absolute Return Fund          USD           13,262.599

Diversified Income Fund          USD             8,424.600 

Emerging Markets Bond Fund         USD             2,583.979

Where I run into a problem is that sometimes the fund name is really long so the value I am extracting is as follows:

      Emerging Markets Short-Term Local Ccy
      Fd                    USD            11,872.891        11.76          139,625.20   1.000000      139,625.20
     Inst Acc

 

as you can tell the result is 

Fd                    USD            11,872.891

 

Is there anyway I can run a conditional parallel that says If text is Emerging Markets Short-Term Local Ccy then combine it with the next multi value token to create desired field value?

Thanks in advance!

0 0

Answer

SELECTED ANSWER
replied on January 8, 2016

We found a good RegEx to put into the pattern match to handle this.

 

((?:\s[\w-]+)+)\s*(\s\w+\s)\s+(USD\s)\s+([\d,\.]+)

 

Seems to do the trick.

2 0

Replies

replied on December 21, 2015

Pattern Matching relies on, well, a pattern to the data. Is there a way you can describe when the data flows to 2 lines? Or does a human have to read it to make that decision?

1 0
replied on December 21, 2015

The text that will be in the line above when it flows into two lines will fall into one of four patterns.  Only when one of those four patterns is present does it need to flow into two lines and those patterns will never be present when the data is on one line.  

1 0
replied on December 21, 2015 Show version history

Hi Jeff,

You can start your pattern with (.{n,})?

This will return any line greater than 'n' characters. For your case, it looks like a new line starts when a fund name is greater than about 35 characters. This also skips over those pesky "Inst Acc" lines.

After that, capture the newline character with (\s)* and then the text leading up to and including USD with (.*USD\s*) (this also grabs any trailing spaces).

To capture the amount, you can use a more focused pattern like (\d{1,3})(,\d{3})*(\.\d{3})

Altogether, the pattern would look like (.{35,})?(\s)*(.*USD\s*)(\d{1,3})(,\d{3})*(\.\d{3})

Depending on your fund names and when a new line is created, you might need to adjust the '35'.

When you assign the matches as a multi-value token, you can use the Token Editor to apply the 'Trim' function, which removes all leading/trailing whitespaces.

Let me know if this works for you!

1 0
SELECTED ANSWER
replied on January 8, 2016

We found a good RegEx to put into the pattern match to handle this.

 

((?:\s[\w-]+)+)\s*(\s\w+\s)\s+(USD\s)\s+([\d,\.]+)

 

Seems to do the trick.

2 0
You are not allowed to follow up in this post.

Sign in to reply to this post.