Point Deep

Mundeep's Tech Blog

Remove Non-Alphanumeric Characters from a String

Posted by mundeep on March 7, 2008

A colleague was looking for an easy way to remove all non-alphanumeric characters from a string and it took some time to find the easiest way was to use RegEx.Replace() as follows:

Regex.Replace(stringToCleanUp, @"[\W]", "");

while \w (lowercase) matches any ‘word’ character, equivalent to [a-zA-Z0-9_]
\W matches any ‘non-word’ character, ie. anything NOT matched by \w

As an alternative if you don’t want to allow the underscore you can use [^a-zA-Z0-9].

The following regular expression quick reference helped in finding this solution:
Regular Expressions Quick Reference

About these ads

16 Responses to “Remove Non-Alphanumeric Characters from a String”

  1. risingsuns said

    Thanks for the tip. It worked like a champ!

  2. Allen said

    Thanks for the info. Saved some trial and error.

  3. cori said

    Thanks for the tip – very helpful.

    Small typo in the code, though. As the description states should be “[/W]” and not “[\W]”.

  4. mundeep said

    Sorry my description incorrectly stated /W, i have now corrected this (the code was already correct – see the examples on the two pages i have linked in the blog post).

  5. Ruben Misrahi said

    Just remember that an additional back-slash is required:
    Regex.Replace(stringToCleanup, “[\\W]”, “”);

  6. Denise said

    Thanks for this tip. I used it today.

  7. Soul said

    how to apply this code on my wp post title code

  8. George Chakhidze said

    Here is an alternate implementation (LINQ to Objects in use):

    string alphaNumeric = new string(stringToCleanup.Where(ch => char.IsLetterOrDigit(ch)).ToArray());

    performs much faster than Regex.

  9. knut said

    George; that is brilliant! Impossible to remember Regex syntax…

  10. I think the code should be:

    Regex.Replace(stringToCleanUp, @”[\W]”, “”);

    [Added the ‘@’ before the string so that the “\” and its consecutive character are not recognized as an escape sequence]

  11. dnitch said

    “\W matches any ‘non-word’ character i.e. anything NOT matched by [a-zA-Z0-9_]” but does this work for extended ASCII characters such as é ñ Ü …
    What would be the pattern to remove all non-alphanumeric characters where “alphanumeric” includes those extended ASCII letters that should remain? In other words, we want to remove anything like ^ %#@$!… and all the other non-word symbols as well, but keep all the ASCII letters a-z,A-Z and the extended letters á-ž

  12. mundeep said

    @dnitch: Have not tested if á-ž are included or not in \W however you could just try [^á-žÁ-Ža-zA-Z0-9] (the ^ at front means NOT anything in the following range).

  13. mundeep said

    @Andreas: Thanks i forgot to escape the \ have corrected that now.

  14. Anonymous said

    Used this to clean up some australian abn/acn numbers… worked a treat…

  15. Wabbletini said

    Thanks for the tip which came in useful

  16. Anonymous said

    I used it today.thanks

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
Follow

Get every new post delivered to your Inbox.

%d bloggers like this: