You are viewing limited content. For full access, please sign in.

Question

Question

Regular expression to remove white space

asked on March 17, 2014

 I am using zone ocr to capture a value example G123456 A  I need to remove the blank space between the 6 and the A.

I have tried a number of variations such as \w[0-9]+ this returns G123456 any help would be appreciated.

 

 

1 0

Answer

SELECTED ANSWER
replied on March 17, 2014

You could use the following expression: (\w+)\s([A-Z|a-z])*

 

To clarify the final piece ([A-Z|a-z])*:

 

this is looking for upper [A-Z]

or |

lowercase [a-z] letters, but not digits.  

 

It's also looking for zero or more matches (because of the *). 

 

Which means that if 1 is in the position of the A in your example, it won't return the digit, but it will still return the rest of the value.  If you're looking to clean it up a little, you can remove the |a-z, if you're not expecting the end character to be lowercase.  

0 1

Replies

replied on March 17, 2014

Hi John,

 

If you don't know where spaces are to appear, try using Pattern Matching activivity (assuming you're dealing with workflows) as follows:

 

1 0
replied on March 17, 2014 Show version history

In this case, you should look to use a \s to indicate that you're expecting a "white space character".  So the full regular expression would be (\w+)\s(\w).  This removes the space character in your example, but it may require additional tweaks to fit your exact case.

 

Please respond if you have any questions or mark this response as an approved answer if it answered your question!

0 0
replied on March 17, 2014

Thanks that worked great

0 0
replied on March 17, 2014

Hi Rob:

I have one other problem,  G123456 A sometimes the A is a number 1.  If it is 1 I would like to disgard the 1.  If it is A keep the A.  I thought with the \w that would be the case.  Any suggestions?

0 0
SELECTED ANSWER
replied on March 17, 2014

You could use the following expression: (\w+)\s([A-Z|a-z])*

 

To clarify the final piece ([A-Z|a-z])*:

 

this is looking for upper [A-Z]

or |

lowercase [a-z] letters, but not digits.  

 

It's also looking for zero or more matches (because of the *). 

 

Which means that if 1 is in the position of the A in your example, it won't return the digit, but it will still return the rest of the value.  If you're looking to clean it up a little, you can remove the |a-z, if you're not expecting the end character to be lowercase.  

0 1
replied on March 18, 2014

Thanks Rob That did the trick

0 0
You are not allowed to follow up in this post.

Sign in to reply to this post.