Andornot Consulting Inc.
Home Page
Home Page
 |  | 

Friday, September 21, 2007

Remove Diacritics

WebPublisher sometimes has problems with diacritic characters, and XML input actually requires that you do not include diacritics in field names AT ALL, so here is a *very* useful C# method that will strip diacritics from any string, replacing them with their non-diacritic equivalents.

[Found at Sorting It All Out. Written in C# .NET 2.0.]

using System.Globalization;
using System.Text;
public static string RemoveDiacritics(string input)
{
    string stFormD = input.Normalize(NormalizationForm.FormD);
    int len = stFormD.Length;
    StringBuilder sb = new StringBuilder();
    for (int i = 0; i < len; i++)
    {
        UnicodeCategory uc = CharUnicodeInfo.GetUnicodeCategory(stFormD[i]);
        if (uc != UnicodeCategory.NonSpacingMark)
        {
            sb.Append(stFormD[i]);
        }
    }
    return (sb.ToString().Normalize(NormalizationForm.FormC));
}

0 Comments:

Post a Comment

<< Home