I have a workflow that successfully finds a regular expression in a document and copies that expression to the parent folder. What I need it to do is find the one on the line that starts with a token value and copy the matching expression on the same line. Can this be done?
Question
Question
Need Workflow to find token value (specific numbers in document text and retrieve next word in same line (Reg Expression)
Replies
You'll likely need a Pattern Matching activity. Inline regular expressions don't resolve token values in the pattern.
Hi Miruna: I have been working on this workflow after adding the pattern matching and I am so close I can taste success!!
My pattern matching worked on one test and the workflow successfully achieved what I wanted, however, on this second test it is not. In troubleshooting, it seems that it is not working because the "Retrieve Document Text" is reading down the columns and not across the lines horizontally!
The pattern matching is supposed to be grabbing the property description found immediately after the roll number, but instead it is grabbing the roll number immediately below the target roll number: %(Roll Number)\s*(.*)[\r]
Can workflow be made to remove lines like in Quick Fields?
Reading this way:
Instead of this way:
That's because (.*) tries to match as much as possible, so it matches everything from 370100 all the way to the final \r.
You need to make it less greedy by excluding newlines from it. Something like [^\r\n] might get you closer ("anything but a newline character").
I have tried that and a number of different options now, but I'm still running into what looks like: the text retrieval read the columns first, so the pattern matching literally cannot see what came after each roll number. So, I guess my question is now, "How can I make the Retrieve Text activity read across each line instead of down each column before moving on to the next column?
The SW-01-044-13-4 address you see in the test value is actually the address that shows up on the same line as the 370100 number on this particular test page.
Retrieve Text does not do any processing, it just reads the text page as-is (and as you'd see if in the web/desktop client). In this case, assuming your test value came from the page, it looks like whenever the image was OCRed, it was done with de-columnize on, so instead of reading the page line by line, the OCR engine was instructed to do it by table columns.
Workflow can't fix that, you'll have to re-OCR the image to fix the text.
That was it, Miruna! That is why it was gathering the text the wrong way. Thanks! Now I need to see why they were OCR'ing that way and see if I can safely make that change on all the targeted records!
Also, I need to find a way to make it stop at the end of the address I want collected. I don't want the 163.0 that is in the third column.
You're still using (.*) which will try to get as much as possible, so it will need some more narrowing down. If those addresses don't have spaces in them, try something like ([^\r\n\s]+) instead of (.*) ("at least one char, but not a newline or a space"). Or if they always follow that format of 5 character groups delimited by a dash, we can work with that.
Your new combination works, as well as \s*(.*)[^\d?\.\r\n]
But: In testing the regex, I'm getting the right result. In testing the workflow, I'm getting either a blank result or the roll number (365700) instead of the address (SE-28-043-13-4)
When actually running the workflow:
Right, because you're using it on the Roll Number value. You have 365700 and are trying to get any character or any character that's not a newline or a space. So that matches all digits there.
So we need to look at why your Roll Number token returns just the number and not the rest of the line. How is that token defined?
Here is a look at the Roll # config. What I need it to do is take the parent folder name with a naming convention that requires 8 digits for the roll # (due to how another workflow operates and creates the folders for me) so in testing, I'm telling it to remove two zeros so I can get the test number that has 6 digits.