You are viewing limited content. For full access, please sign in.

Question

Question

Regex to pull first two columns from a comma separated file

asked on November 13, 2019

I want to pull the second string in a comma delimited list where the first value is numeric and the second is alpha.  

I'm using `\d[^,]+(?=,)` to pull the numeric value in the first field and just need help with pulling the second value from the "Name" column.

Here's part of a sample file that I'm trying to extract data from:

    Address Number,Name,Employee Master Exist(Y/N),Auto-Deposit Exists(Y/N),Supplier Master Exists(Y/N),Supplier Master Created,ACH Account Exists(Y/N),ACH Account Created,ACH Same as Auto-deposit(Y/N)

//line break here is for clarity and does not exist in file//

    4398,Presley Elvis Aaron,Y,N,Y,N,Y,N,N
    10154,Shepard Alan Barrett,Y,Y,Y,N,Y,N,N

 

Since I'm already pulling the "Address Number" column successfully into a multi-value token, I am planning on creating a second multi-value token for the "Name" column and would like assistance with a regex that will work for that.  Thanks!

0 0

Replies

replied on November 14, 2019

Hello Mike,

 

Are you using SDK Activity in order to do this ?

if yes, you can get the first field by using a more simple regex as follows : "^(\d+),"

and in order to get the second field you can use the following : "^\d+,([\w+\s*]+),"

 

but i have a question, and in the same time a suggestion.

why don't you split the string read on "," and read the first two elements of the split array ?

like the following : "str.Split(",")(0)" and "str.Split(",")(1)" i think its easier than regex and groups.

 

Hope this Helped!

Maher.

 

 

 

 

 

 

1 0
replied on November 14, 2019

I strongly second the "split on comma" approach Maher suggests. Wouldn't roll your own regex unless there is an especially compelling reason to do so. The Workflow token editor has a built-in Split function that should prove useful.

Note that the split function adds a blank space at the end of each value. Additionally, if the single-value token has spaces between the delimiter and the next value those spaces will be retained. Combine this function with the "Trim" function to remove those extra spaces from the multi-value token.

0 0
replied on November 14, 2019

I've been playing around with this method, and it looks like it will do what I want except I would need it to skip any header rows of data.  How can I get it to skip any header rows?

0 0
replied on November 14, 2019

Not sure offhand without seeing your workflow. Can you delete the first value from each of your multi-value tokens afterward since those should always be the headers? Then do whatever you want with the resulting headerless tokens.

0 0
replied on November 14, 2019

If there's a way to delete the first row, I'd sure like to know how to do it!  smiley

I don't see anything in the toolbox that allows me to delete a value from a token. I suppose I could use a Conditional Decision in the For Each Value loop and only add field values to the token if they  match a pattern.  But, if you know how to just delete the first row that might be better.

 

 

0 0
replied on November 14, 2019

Here's a way that uses the multi-value token array index to selectively drop/not-add the initial header row. Put another way, it writes all values except the first from the original token array to a new token array. 

Best to avoid pattern matching when not strictly necessary.

Illustrative workflow:

Details on the Append Value at Index to NameNoHeader Token activity config:

0 0
replied on November 15, 2019

I appreciate that you put this together for an example.  I think I'm doing everything correctly but it only seems to be pulling the first row, which are column headers, in the EmpIdWithHeader task.  The condition for iteration >1 evaluates as False when I run the workflow.  Willing to share, but the file attachment in this forum doesn't allow Workflow files.

 

0 0
replied on November 15, 2019

Welcome! To upload Workflow files, simply add a .txt to the .wfi file and remove it afterward. I've attached my example for you to look at.

Looks like you're trying to do the Split and index-looping operations at the same time.

You need to split up the steps:

  1. Acquire raw CSV values
  2. Create a new token that you assign the CSV values and run the Split function on. This should end up like the "NameWithHeader" token in my example.
  3. Create a new token ("NameNoHeader") that you feed the token from step 2 into.

 

0 0
replied on November 14, 2019

I appreciate the suggestions!  I'm not familiar with the SDK yet as I'm a relatively new user and still getting my feet wet with Regex.  If either of you could provide a bit of a "how to" on using the SDK Activity for this task I would very much appreciate it.

 

Thanks,

Mike

 

0 0
replied on November 14, 2019

Hey Mike,

As a small point of clarification, you would use the "Script" activity rather than the "SDK Script" activity. The SDK Script activity simply the Script activity with the Laserfiche Repository Access (RA) .NET libraries pre-loaded. Unless you're connecting to your repository within the script, you don't need RA and it adds a bit of overhead for nothing.

Are you generally familiar with .NET code in C# or VB.NET? If so, simply go ahead and Google ".NET CSV parser". You'll find a bunch of examples on Stack Overflow and such. Nothing involves the actual Laserfiche SDK.

If not, I'd recommend trying to do what you need with native workflow functionality such as the Token Editor split function I mentioned.

Another potential option is getting a CSV-compatible ODBC driver, configuring your file as a Data Source in workflow, and querying it with something like "SELECT Address Number, Name". Workflow will automatically give you the results in separate multi-value tokens for each column.

0 0
replied on November 14, 2019 Show version history

Thanks, I will check that out.  (I do have extensive experience with VBA, so I think that will apply most directly to vb.net).

 

0 0
replied on November 14, 2019 Show version history

Definitely recommend starting with Microsoft's native TextFieldParser library for it then.

Microsoft - How to: read from comma-delimited text files in Visual Basic

Microsoft - TextFieldParser Class reference

Make sure to add Microsoft.VisualBasic.FileIO.TextFieldParser as a reference in your script!

Note to others that come across this post: The TextFieldParser library is still a .NET library that any CLR-compatible language (such as C#) can use, even though it is in the "VisualBasic" namespace.

Dot Net Pearls - C# TextFieldParser Examples: Read CSV

0 0
replied on November 15, 2019

Does Visual Basic/Visual Studio need to be installed to access the Microsoft.VisualBasic.FileIO.TextFieldParser?  It's not listed in the available References.

 

0 0
replied on November 15, 2019

It shouldn't. I confirmed on a Windows Server 2016 machine that it doesn't show up in the list by default for whatever reason. Here's how you can add the reference from the "Browse" option:

Make sure to select version 10 (sort by Name):

Click "Open" and it should then appear in the Workflow Add Reference screen:

Hit "OK" to add the Microsoft.VisualBasic reference. You can then successfully import the Microsoft.VisualBasic.FileIO namespace:

Let me know if that works for you!

0 0
You are not allowed to follow up in this post.

Sign in to reply to this post.