Skip to the content Back to Top

A few months ago I posted a C# routine that replaces diacritic characters with their non-diacritic equivalents. It helped me create valid field names for use with XML requests to Webpublisher, but it wasn't the whole story. There are other commandments that must be followed to get valid field names: spaces shall be replaced with underscores, hyphens shall be replaced with underscores, thou shalt not suffer a digit to start a field name without preceding it with an underscore, etc.

Here's the whole deal, wherein you pass in your unclean field names and get purified field names on the other side.

public static string[] CleanFieldNames(string[] fields)
{
    string[] output;
    int len = fields.Length;
    string exprStartsWithDigit = @"^\d";
    List<string> list = new List<string>();
 
    for (int i = 0; i < len; i++)
    {
        string field = fields[i];
        // Replace spaces with underscores
        field = field.Replace(" ", "_");
        // Replace hyphens with underscores
        field = field.Replace("-", "_");
        // Precede digit at start of field name with underscore
        if (Regex.IsMatch(field, exprStartsWithDigit))
        {
            field = string.Concat("_", field);
        }
        // Replace extended chars with non-diacritic equivalent
        field = RemoveDiacritics(field);
 
        list.Add(field);
    }
    output = new string[list.Count];
    list.CopyTo(output);
 
    return output;
}
 
public static string RemoveDiacritics(string input)
{
    string stFormD = input.Normalize(NormalizationForm.FormD);
    int len = stFormD.Length;
    StringBuilder sb = new StringBuilder();
    for (int i = 0; i < len; i++)
    {
        UnicodeCategory uc = CharUnicodeInfo.GetUnicodeCategory(stFormD[i]);
        if (uc != UnicodeCategory.NonSpacingMark)
        {
            sb.Append(stFormD[i]);
        }
    }
    return (sb.ToString().Normalize(NormalizationForm.FormC));
}

N.B. I always adhere to what Inmagic calls "SOAP Format" when making XML requests to Webpublisher. This means I follow one set of rules as regards field names, and don't have to worry about output schema differences between CS Webpublisher and Dbtext Webpublisher. (CS Webpublisher must adhere to SOAP Format but it's optional with Dbtext Webpublisher.)

Let Us Help You!

We're Librarians - We Love to Help People