Question

Looking To Combine Multi-Value Token Based on Condition or Edit Document Text In Workflow

Workflow

Updated January 8, 2016

asked on December 18, 2015

I have a workflow where I am retrieving document text and running a pattern match so that if certain criteria is met the text is placed into a metadata field. 90% of the time this is successful and I didn't know if anyone had any recommendations for what to do the other 10% of the time. Here is an example of the text that I am extracting

Commodity Real Return Fund    USD        11,037.528       5.92    65,342.17 1.000000    65,342.17
    Inst Acc
Credit Absolute Return Fund     USD        13,262.599    11.07     146,816.97 1.000000     146,816.97
    Inst Acc
Diversified Income Fund         USD        8,424.600    19.36     163,100.26 1.000000     163,100.26
    Inst Acc
Emerging Markets Bond Fund    USD        2,583.979    38.31    98,992.24 1.000000    98,992.24

The pattern that I am using is .*USD.................

The field that I get which is what I want is

Commodity Real Return Fund USD 11,037.528

Credit Absolute Return Fund USD 13,262.599

Diversified Income Fund USD 8,424.600

Emerging Markets Bond Fund USD 2,583.979

Where I run into a problem is that sometimes the fund name is really long so the value I am extracting is as follows:

Emerging Markets Short-Term Local Ccy
Fd USD 11,872.891 11.76 139,625.20 1.000000 139,625.20
Inst Acc

as you can tell the result is

Fd USD 11,872.891

Is there anyway I can run a conditional parallel that says If text is Emerging Markets Short-Term Local Ccy then combine it with the next multi value token to create desired field value?

Thanks in advance!

0 0

Answer

SELECTED ANSWER

replied on January 8, 2016

We found a good RegEx to put into the pattern match to handle this.

((?:\s[\w-]+)+)\s*(\s\w+\s)\s+(USD\s)\s+([\d,\.]+)

Seems to do the trick.

2 0

Replies

replied on December 21, 2015

Pattern Matching relies on, well, a pattern to the data. Is there a way you can describe when the data flows to 2 lines? Or does a human have to read it to make that decision?

1 0

replied on December 21, 2015

The text that will be in the line above when it flows into two lines will fall into one of four patterns. Only when one of those four patterns is present does it need to flow into two lines and those patterns will never be present when the data is on one line.

1 0

replied on December 21, 2015 • Show version history

Hi Jeff,

You can start your pattern with (.{n,})?

This will return any line greater than 'n' characters. For your case, it looks like a new line starts when a fund name is greater than about 35 characters. This also skips over those pesky "Inst Acc" lines.

After that, capture the newline character with (\s)* and then the text leading up to and including USD with (.*USD\s*) (this also grabs any trailing spaces).

To capture the amount, you can use a more focused pattern like (\d{1,3})(,\d{3})*(\.\d{3})

Altogether, the pattern would look like (.{35,})?(\s)*(.*USD\s*)(\d{1,3})(,\d{3})*(\.\d{3})

Depending on your fund names and when a new line is created, you might need to adjust the '35'.

When you assign the matches as a multi-value token, you can use the Token Editor to apply the 'Trim' function, which removes all leading/trailing whitespaces.

Let me know if this works for you!

1 0

SELECTED ANSWER

replied on January 8, 2016

We found a good RegEx to put into the pattern match to handle this.

((?:\s[\w-]+)+)\s*(\s\w+\s)\s+(USD\s)\s+([\d,\.]+)

Seems to do the trick.

2 0

You are not allowed to follow up in this post.

Question

Question

Looking To Combine Multi-Value Token Based on Condition or Edit Document Text In Workflow

Answer

Replies

Sign in to reply to this post.