Remove Diacritics

by Peter Tyrrell Friday, September 21, 2007 3:59 PM

WebPublisher sometimes has problems with diacritic characters, and XML input actually requires that you do not include diacritics in field names AT ALL, so here is a *very* useful C# method that will strip diacritics from any string, replacing them with their non-diacritic equivalents.

[Found at Sorting It All Out. Written in C# .NET 2.0.]

using System.Globalization;
using System.Text;
public static string RemoveDiacritics(string input)
    string stFormD = input.Normalize(NormalizationForm.FormD);
    int len = stFormD.Length;
    StringBuilder sb = new StringBuilder();
    for (int i = 0; i < len; i++)
        UnicodeCategory uc = CharUnicodeInfo.GetUnicodeCategory(stFormD[i]);
        if (uc != UnicodeCategory.NonSpacingMark)
    return (sb.ToString().Normalize(NormalizationForm.FormC));


blog comments powered by Disqus

Month List