Remove Diacritics

by Peter Tyrrell Friday, September 21, 2007 3:59 PM

WebPublisher sometimes has problems with diacritic characters, and XML input actually requires that you do not include diacritics in field names AT ALL, so here is a *very* useful C# method that will strip diacritics from any string, replacing them with their non-diacritic equivalents.

[Found at Sorting It All Out. Written in C# .NET 2.0.]

using System.Globalization;
using System.Text;
public static string RemoveDiacritics(string input)
{
    string stFormD = input.Normalize(NormalizationForm.FormD);
    int len = stFormD.Length;
    StringBuilder sb = new StringBuilder();
    for (int i = 0; i < len; i++)
    {
        UnicodeCategory uc = CharUnicodeInfo.GetUnicodeCategory(stFormD[i]);
        if (uc != UnicodeCategory.NonSpacingMark)
        {
            sb.Append(stFormD[i]);
        }
    }
    return (sb.ToString().Normalize(NormalizationForm.FormC));
}

Tags:

blog comments powered by Disqus

Month List