You are viewing limited content. For full access, please sign in.

Question

Question

What is the regular expression to remove the string "abc\" from a token?

asked on February 19, 2015

I'm trying to remove the string "abc\" from a string, such as abc\AA1234, to obtain the result value AA1234, but I'm doing something wrong.

 

I tried abc\(\d+)

 

And the result value says parsing "abc\(\d+)" - Too many )'s.

 

What should the regular expression be?
 

0 0

Answers

APPROVED ANSWER
replied on February 23, 2015 Show version history

Deborah, Miruna's solution didn't work for you not because of the double slash, but because of the (\d+). On the other hand, Derek's solution worked for you because of the (\S+). abc\W(\S+) might work here, but it will also "work" on something like

 

abc#AA1234

or

abc&AA1234

 

Or any string where the fourth character is something other than a letter or a digit, since that's what \W means. This might be what you want, but from your original description, it likely is not.

As for the part following that, (\d+) doesn't work because the slash is followed by AA, which is not matched by \d+ since \d only matches digits (it would, however, work on an input like abc\123456). In the suggested solution, (\S+) worked because that matches "anything other than a space character". That means the suggested solution would also match abc\#$%^& by trimming the abc\. This might be what you want, but it might also not.

From your description, the regex you'd most likely want would be

abc\\(\w+)

or

abc\\(\S+)

or maybe even

abc\\(.*)

(everything after the abc\, including spaces, endlines, and other words, in the case the input you are trimming from is the entirety of the input you want)

This is all depending on how restrictive you want the capture to be.

2 0
SELECTED ANSWER
replied on February 19, 2015

The problem is that the "\" is a function in the expression. Use this patter instead.

abc\W(\S+)

1 0

Replies

replied on February 19, 2015

abc\\(\d+)

\ is a special characters that turns the following character into a literal character rather than regular expression. To make it act as a literal slash, you use 2 of them.

0 0
replied on February 19, 2015

Thank you!  I had a feeling it was due to the backslash.  The additional backslash didn't work, but abc\W(\S+) worked.

0 0
replied on February 20, 2015

I have heard of the double backslash and have tried it several times and never been able to get it to work either. 

0 0
replied on March 17, 2015 Show version history

I have a similiar situation but I want to remove prefix characters and suffix characters, the name is acct00123.pdf and I want to remove acct and .pdf  in pattern matching?

I thought I had it using your examples but I am not getting any results when I test.

 

0 0
replied on March 17, 2015

Is the prefix always "acct"? Is the suffix always "pdf"? Or do you know something a bit more general, such as "the prefix always has letters but no numbers and the suffix always follows a period"?

If it's the former case (always acct and always pdf), then you can simply do this for your pattern:

     acct(\d+)\.pdf

If it's more general (ie prefix can be multiple letters, suffix is any letters or numbers after the period) then you can do something more general:

      [a-zA-Z]+(\d+)\.\w+

I'll explain this in parts:

[a-zA-Z]+   :    One or more of any lowercase or uppercase alphabetical letter. More specifically, letters a through z, and A through Z. 

(\d+)          :   The numbers you want to "capture". \d+ means "one or more digits (ie same as [0-9]+, or one or more characters from 0 to 9)". The parentheses turn this into a "capture", meaning the pattern match will result in just the bit you capture, effectively trimming out the rest of the input.

\.                :  The period, escaped. In a regex, a period character by itself is used to symbolize a wildcard; as in, it can represent any character. By using a slash, "\", you tell the regex you want to match exactly the period character. This slash is very important here!!

\w+           :  One or more word characters. \w matches any alphabetical character as well as any digit. This lets the pattern match suffixes like .pdf but also lets it match suffixes with numbers, like .7z

 

Oh, I also recommend posting this as a new post here for visibility.

 

Let me know if you have any questions =)

1 0
replied on March 18, 2015

Thank you Flavio that worked perfectly!

0 0
You are not allowed to follow up in this post.

Sign in to reply to this post.