Andornot Blog

Wednesday, February 24, 2010 3:18 PM

Replace MS Word special characters in javascript and C#

by Peter Tyrrell

MS Word uses characters from the Windows-1252 character encoding set which are not represented in ASCII or ISO-8859-1. This is often a pain in the butt. Special characters include:

  • the… ellipsis
  • ‘smart’ “quotes”
  • en – dash and em — dash
  • dagger † and double dagger ‡
  • and more, but these are most common.

If you want to replace them with ASCII cognates, here's a function to do that. (Daggers don't have cognates as far as I know.)

Javascript

/// Replaces commonly-used Windows 1252 encoded chars that do not exist in ASCII or ISO-8859-1 with ISO-8859-1 cognates.
var replaceWordChars = function(text) {
    var s = text;
    // smart single quotes and apostrophe
    s = s.replace(/[\u2018|\u2019|\u201A]/g, "\'");
    // smart double quotes
    s = s.replace(/[\u201C|\u201D|\u201E]/g, "\"");
    // ellipsis
    s = s.replace(/\u2026/g, "...");
    // dashes
    s = s.replace(/[\u2013|\u2014]/g, "-");
    // circumflex
    s = s.replace(/\u02C6/g, "^");
    // open angle bracket
    s = s.replace(/\u2039/g, "<");
    // close angle bracket
    s = s.replace(/\u203A/g, ">");
    // spaces
    s = s.replace(/[\u02DC|\u00A0]/g, " ");
    
    return s;
}

C# extension method

public static string ReplaceWordChars(this string text)
        {
            var s = text;
            // smart single quotes and apostrophe
            s = Regex.Replace(s, "[\u2018|\u2019|\u201A]", "'");
            // smart double quotes
            s = Regex.Replace(s, "[\u201C|\u201D|\u201E]", "\"");
            // ellipsis
            s = Regex.Replace(s, "\u2026", "...");
            // dashes
            s = Regex.Replace(s, "[\u2013|\u2014]", "-");
            // circumflex
            s = Regex.Replace(s, "\u02C6", "^");
            // open angle bracket
            s = Regex.Replace(s, "\u2039", "<");
            // close angle bracket
            s = Regex.Replace(s, "\u203A", ">");
            // spaces
            s = Regex.Replace(s, "[\u02DC|\u00A0]", " ");
            
            return s;
        }
Thursday, September 03, 2009 12:04 PM

Highlight search terms with jQuery

by Peter Tyrrell

Overview

Highlight words and phrases within specified elements on the page. Search syntax is trimmed or eradicated, stopwords and words less than 3 characters  are ignored.

Ingredients

You will need:

   1:  
   2: /*
   3:     methods to help highlight search words and terms 
   4:     depends on jquery
   5:     Peter Tyrrell, August 2009
   6: */
   7:  
   8: // 
   9: var highlightTermsIn = function(jQueryElements, terms) {
  10:     var wrapper = ">$1<b style='font-weight:normal;color:#000;background-color:rgb(255,255,102)'>$2</b>$3<";
  11:     for (var i = 0; i < terms.length; i++) {
  12:         var regex = new RegExp(">([^<]*)?("+terms[i]+")([^>]*)?<","ig");
  13:         jQueryElements.each(function(i) {
  14:             $(this).html($(this).html().replace(regex, wrapper));
  15:         }); 
  16:     };
  17: }
  18:  
  19: // returns array of unique search terms (words, phrases) found in value        
  20: var parseSearchTerms = function(value) {
  21:     
  22:     // split string on spaces and respect double quoted phrases
  23:     var splitRegex = /(\u0022[^\u0022]*\u0022)|([^\u0022\s]+(\s|$))/g;
  24:     var rawTerms = value.match(splitRegex);
  25:     
  26:     var terms = [];            
  27:     for (var i = 0; i < rawTerms.length; i++) {
  28:         
  29:         // trim whitespace, quotes, apostrophes and query syntax special chars
  30:         var term = rawTerms[i].replace(/^[\s\u0022\u0027+-][\s\u0022\u0027+-]*/, '').replace(/[\s*~\u0022\u0027][\s*~\u0022\u0027]*$/, '').toLowerCase();
  31:         
  32:         // ignore if <= 2 chars
  33:         if (term.length <= 2) {
  34:             continue;
  35:         }
  36:         
  37:         // ignore stopwords
  38:         var stopwords = ["about","are","from","how","that","the","this","was","what","when","where","who","will","with","the"];
  39:         var isStopword = false;
  40:         for (var j = 0; j < stopwords.length; j++) {
  41:             if (term == stopwords[j]) {
  42:                 isStopword = true;
  43:                 break;
  44:             }
  45:         }
  46:         if (isStopword === true) {
  47:             continue;
  48:         }
  49:         
  50:         // add term to term list
  51:         terms[terms.length] = term;
  52:     }
  53:     return terms;
  54: }

Example 1

Pass an array of terms to be highlighted in jquery-selected elements:

   1: <script type="text/javascript">
   2:     $(document).ready(function() {
   3:         var searchTerms = ["banana", "monkey"];
   4:         // highlight valid terms in search results          
   5:         highlightTermsIn($("#HighlightWrapper"), searchTerms);        
   6:     });
   7: </script>

Example 2

Parse raw search input to strip out stopwords, query syntax characters, and wee short words less than 3 characters that do nobody any good. Quoted phrases are treated as a single term.

   1: <script type="text/javascript">
   2:     var rawSearch = "give the banana* to a monkey";
   3:     var termsToHighlight = parseSearchTerms(rawSearch);
   4:     // termsToHighlight now = ["give", "banana", "monkey"]
   5: </script>

Example 3

Put it all together to retrieve raw search input from the query string, parse out terms to highlight, and highlight within specified containers.

   1: <script type="text/javascript">
   2:     $(document).ready(function() {
   3:         // get quick search value from query string
   4:         var quickSearch = $.query.get("q");
   5:         // highlight valid terms in divs marked class="HighlightWrapper"        
   6:         highlightTermsIn($("div.HighlightWrapper"), parseSearchTerms(quickSearch));
   7:     });     
   8: </script>

Acknowledgements

 

Monday, March 09, 2009 9:09 AM

really truly private variables in javascript

by Peter Tyrrell

I just read, and then for good measure re-read, "JavaScript: The Good Parts" by Douglas Crockford, who is probably the foremost javascript authority on planet Earth. The book blew my mind. I thought I knew javascript; I thought I had a pretty good grasp of it; before I picked up this book I would have referred to myself as a javascript expert when introducing myself at parties. It turns out I had a lot more to think about. I find this delightful.

One of the valuable lessons I learned is a javascript module pattern, discussed with examples at the YUI (Yahoo! User Interface) blog: http://yuiblog.com/blog/2007/06/12/module-pattern/. The take home message is that you can create objects that support private members. I didn't even know that was possible, but it turns out that it is, due to function scope and the concept of closure. It took me a few reads with furrowed brow to grok closure, so I can hardly explain it quickly, but essentially:

  1. A function can return a function and then wink out of existence.
  2. The returned inner function retains access to other members and data defined in its original parent function.
  3. Those other members and data are not directly accessible anymore, so they are private.

Fantastic.

Tuesday, August 05, 2008 11:15 PM

DB/Text trace log format method

by Peter Tyrrell

Further to my post of May 2007, where I described how to create a trace log in DB/Text and handle exceptions, here is a method that is handy for building more complex log statements.

   1: // construct formatted log statement
   2: function logFormat(val, arg1, arg2, arg3)
   3: {
   4:     var formattedVal = val;
   5:     if (arg1 != null)    {
   6:         formattedVal = formattedVal.replace("{0}", arg1);
   7:     }
   8:     if (arg2 != null) {
   9:         formattedVal = formattedVal.replace("{1}", arg2);
  10:     }
  11:     if (arg3 != null) {
  12:         formattedVal = formattedVal.replace("{2}", arg3);
  13:     }
  14:     log(formattedVal);
  15: }
  16:  
  17: // write log statement to form box
  18: function log(val)
  19: {
  20:     var box = Form.boxes("boxDebugLog");
  21:     if (box == null)
  22:         return;    
  23:     
  24:     if (box.content != '')
  25:     {
  26:         box.content += "\n";    
  27:     }    
  28:     box.content += val;
  29: }

Overview

logFormat takes a string value and up to 3 arguments, and replaces tokens found in the string value with those arguments. It's based on C#'s string.Format method, and others like it.

Example

It's easier to read (and write) a format string that contains tokens than a string concatenated together with plus signs.

   1: var count = 10;
   2: var name = "Peter";
   3: var duration = 30;
   4:  
   5: function LogTheOldWay()
   6: {
   7:     log(name + " was lashed with a wet noodle " + count + " times for " + duration + " seconds.");
   8: }
   9:  
  10: function LogTheNewWay()
  11: {    
  12:     logFormat("{0} was lashed with a wet noodle {1} times for {2} seconds.", name, count, duration);
  13: }

Update

My example above only proves how piddly my skills really are. There's a *much* better way of doing this that allows for n arguments instead of a maximum of three. I had forgotten that every javascript function has a local property called arguments that contains all parameters passed to the function as an array.

   1: function logFormat(val)
   2: {
   3:     var formattedVal = val;
   4:   for(i = 1; i < arguments.length; i++)
   5:   {
   6:     formattedVal = formattedVal.replace("{" + (i - 1) + "}", arguments[i]);
   7:   }
   8:   return formattedVal;
   9: }
Tuesday, August 05, 2008 10:55 PM

"Static" properties in javascript. Kinda.

by Peter Tyrrell

The following gives me the ability to access javascript properties in a static kind of way. Not really, since the object has been instantiated, but at least I don't have to define a specific object, then call its constructor all the time to new it up and get at its properties.

   1: // declare globally
   2: var Artist = {
   3:     Fields: {
   4:         Id: "ID",
   5:         FullName: "Term",
   6:         FirstName: "First Name",
   7:         LastName: "Last Name",
   8:         Birth: "Date of Birth",
   9:         Death: "Date of Death"
  10:     }
  11: };

Overview

I like to store DB/Text field names in this way, as they become (pseudo) strongly-typed and the magic strings are stored in one place only, instead of scattered across my code like dandelion seeds.

It can be difficult to work in javascript after C# and other full-featured languages. This is the closest I could come to static properties or a struct. If I weren't scripting in DB/Text, a proprietary closed system, I would be using jQuery or mootools, which put the joy back in joyvascript.

Example

Now I can access field names statically as follows. Look ma, no constructors!

   1: function SomeMethod()
   2: {
   3:     var idFieldName = Artist.Fields.Id;
   4: }

Month List