Highlight search terms with jQuery

by Peter Tyrrell Thursday, September 03, 2009 12:04 PM

Overview

Highlight words and phrases within specified elements on the page. Search syntax is trimmed or eradicated, stopwords and words less than 3 characters  are ignored.

Ingredients

You will need:

   1:  
   2: /*
   3:     methods to help highlight search words and terms 
   4:     depends on jquery
   5:     Peter Tyrrell, August 2009
   6: */
   7:  
   8: // 
   9: var highlightTermsIn = function(jQueryElements, terms) {
  10:     var wrapper = ">$1<b style='font-weight:normal;color:#000;background-color:rgb(255,255,102)'>$2</b>$3<";
  11:     for (var i = 0; i < terms.length; i++) {
  12:         var regex = new RegExp(">([^<]*)?("+terms[i]+")([^>]*)?<","ig");
  13:         jQueryElements.each(function(i) {
  14:             $(this).html($(this).html().replace(regex, wrapper));
  15:         }); 
  16:     };
  17: }
  18:  
  19: // returns array of unique search terms (words, phrases) found in value        
  20: var parseSearchTerms = function(value) {
  21:     
  22:     // split string on spaces and respect double quoted phrases
  23:     var splitRegex = /(\u0022[^\u0022]*\u0022)|([^\u0022\s]+(\s|$))/g;
  24:     var rawTerms = value.match(splitRegex);
  25:     
  26:     var terms = [];            
  27:     for (var i = 0; i < rawTerms.length; i++) {
  28:         
  29:         // trim whitespace, quotes, apostrophes and query syntax special chars
  30:         var term = rawTerms[i].replace(/^[\s\u0022\u0027+-][\s\u0022\u0027+-]*/, '').replace(/[\s*~\u0022\u0027][\s*~\u0022\u0027]*$/, '').toLowerCase();
  31:         
  32:         // ignore if <= 2 chars
  33:         if (term.length <= 2) {
  34:             continue;
  35:         }
  36:         
  37:         // ignore stopwords
  38:         var stopwords = ["about","are","from","how","that","the","this","was","what","when","where","who","will","with","the"];
  39:         var isStopword = false;
  40:         for (var j = 0; j < stopwords.length; j++) {
  41:             if (term == stopwords[j]) {
  42:                 isStopword = true;
  43:                 break;
  44:             }
  45:         }
  46:         if (isStopword === true) {
  47:             continue;
  48:         }
  49:         
  50:         // add term to term list
  51:         terms[terms.length] = term;
  52:     }
  53:     return terms;
  54: }

Example 1

Pass an array of terms to be highlighted in jquery-selected elements:

   1: <script type="text/javascript">
   2:     $(document).ready(function() {
   3:         var searchTerms = ["banana", "monkey"];
   4:         // highlight valid terms in search results          
   5:         highlightTermsIn($("#HighlightWrapper"), searchTerms);        
   6:     });
   7: </script>

Example 2

Parse raw search input to strip out stopwords, query syntax characters, and wee short words less than 3 characters that do nobody any good. Quoted phrases are treated as a single term.

   1: <script type="text/javascript">
   2:     var rawSearch = "give the banana* to a monkey";
   3:     var termsToHighlight = parseSearchTerms(rawSearch);
   4:     // termsToHighlight now = ["give", "banana", "monkey"]
   5: </script>

Example 3

Put it all together to retrieve raw search input from the query string, parse out terms to highlight, and highlight within specified containers.

   1: <script type="text/javascript">
   2:     $(document).ready(function() {
   3:         // get quick search value from query string
   4:         var quickSearch = $.query.get("q");
   5:         // highlight valid terms in divs marked class="HighlightWrapper"        
   6:         highlightTermsIn($("div.HighlightWrapper"), parseSearchTerms(quickSearch));
   7:     });     
   8: </script>

Acknowledgements

 

Tags: javascript

What I have learned the hard way (as usual) with VMWare

by Peter Tyrrell Wednesday, May 27, 2009 11:41 AM

Always take snapshots when the guest is powered-down.

Although in VMWare Workstation you can take a snapshot at any time, you cannot clone a snapshot taken of a powered-on guest.

So for example, I patched a new Windows Server 2003 guest OS with dozens of Windows Updates, installed SQL Server, installed and configured SQL Server Reporting Services, etc. It took hours. I made snapshots at each phase, and indeed was able to revert back to one after screwing up royally. However, when I later went to clone the "all patches" snapshot to re-use the guest somewhere else, I was unable to do so because the guest had been on when I took the snapshot. I pulled out all my teeth in frustration and smashed them with a hammer, then began uninstalling everything after the patches.

Check the guest firewall settings if you have host-to-guest network problems

If you have double-checked that the virtual network settings are not to blame for an inability of the host to communicate with the guest, then ensure the guest firewall settings are not blocking incoming requests.

In my case the guest was using bridged networking and could ping the host and connect to the internet via the LAN gateway. The host could not ping the guest, the reason being that the guest firewall disallowed incoming echo requests by default. Further, since I wanted to use the guest as an http server, I needed to allow http requests at the firewall level. The agony preceding this discovery was akin to a hot knife stabbing repeatedly into my liver.

32 bit guests created on a 64 bit host can only be deployed on a 32 bit host if the deployed host hardware supports 64 bit processing

This one is counterintuitive. Just because it's a 32 bit guest doesn't mean it's going to run on a 32 bit host.

Newer machines tend to support 64 bit processing. Older ones don't. It's entirely possible that a host will be running a 32 bit OS but be 64 bit capable. How do you know? Download CPU-Z and it will examine the hardware and tell you. The VMWare CPU Identification Utility might also help, but I'm unclear as to whether it just checks for a 64 bit OS, or hardware 64 bit capability.

Tags:

The importance of beauty

by Peter Tyrrell Tuesday, April 21, 2009 10:55 AM

A fine article from the good people at A List Apart (the website for people who make websites). I always strive for aesthetically pleasing design in my work at Andornot. It just feels better when it looks good.

In Defense of Eye Candy
"Research proves attractive things work better. How we think cannot be separated from how we feel. The next time a boss, client, or co-worker scoffs at the notion that beauty is an important aspect of interface design, point their peepers here."

http://www.alistapart.com/articles/indefenseofeyecandy

 

Tags:

really truly private variables in javascript

by Peter Tyrrell Monday, March 09, 2009 9:09 AM

I just read, and then for good measure re-read, "JavaScript: The Good Parts" by Douglas Crockford, who is probably the foremost javascript authority on planet Earth. The book blew my mind. I thought I knew javascript; I thought I had a pretty good grasp of it; before I picked up this book I would have referred to myself as a javascript expert when introducing myself at parties. It turns out I had a lot more to think about. I find this delightful.

One of the valuable lessons I learned is a javascript module pattern, discussed with examples at the YUI (Yahoo! User Interface) blog: http://yuiblog.com/blog/2007/06/12/module-pattern/. The take home message is that you can create objects that support private members. I didn't even know that was possible, but it turns out that it is, due to function scope and the concept of closure. It took me a few reads with furrowed brow to grok closure, so I can hardly explain it quickly, but essentially:

  1. A function can return a function and then wink out of existence.
  2. The returned inner function retains access to other members and data defined in its original parent function.
  3. Those other members and data are not directly accessible anymore, so they are private.

Fantastic.

Tags: javascript

Inmagic Webpublisher and character encoding

by Peter Tyrrell Wednesday, March 04, 2009 1:40 PM

There is a lot of potentially confusing information when speaking of character sets and encodings, so here's how it all relates to Inmagic DB/Text and Webpublisher, according to my understanding.

Inmagic DB/Text supports the western Latin character set, which covers the characters used in Western European languages. (All the characters you would find in ANSI/ASCII.) It does not support Unicode, which is the Ultimate Character Set For Every Character Ever Conceived, Even Klingon.

Inmagic Webpublisher supports two types of character encoding: ISO-8859-1 and UTF-8. Encodings are computer representations of character sets. ISO-8859-1 encodes just the western Latin character set, while UTF-8 encodes the western Latin character set plus the rest of Unicode. Thus you can ask Webpublisher to encode DB/Text data output to the browser in either.

The de facto standard for web pages these days is UTF-8 encoding, for the obvious reason that it supports the widest possible range of characters. The browser is dependent on the web page itself to tell it what encoding is acceptable. If the web page is remiss in sending encoding instructions, then the browser is forced to fall back to a default position, which may differ from browser to browser. (Especially old browsers.) But that would only be the case for a web page or website that is not well designed! A good design completely controls the encoding: from the encoding a web page file is saved in, to the content-type header the HTTP server emits, which must match! A meta tag that sets content-type only matters in offline viewing.

Webpublisher serves up data dynamically, so there is no web page file to worry about. However, it does by default encode with ISO-8859-1, and one must explicitly tell it to encode with UTF-8 if that is desired.

Query string parameters that govern encoding

OEH=UTF-8 (cars query example)
This query string parameter tells Webpublisher to emit HTML encoded in UTF-8. Sure enough, Webpublisher adds a content-type header set to "text/html; charset=UTF-8". Interestingly, Webpublisher does not set any content-type header by default for HTML, so it's best to always supply an OEH value.

OEX=UTF-8 (cars query example)
This query string parameter tells Webpublisher to emit XML encoded in UTF-8. Webpublisher correctly sets the encoding attribute on the xml tag to <?xml encoding="UTF-8" ?>. However, it also adds a content-type header set to "text/xml; charset=ISO-8859-1" on every XML response, regardless of OEX value, which appears to be incorrect.

Further reading:

The Definitive Guide to Web Character Encoding by Tommy Olsson (2007)
http://www.sitepoint.com/article/guide-web-character-encoding/

The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) by Joel Spolsky (2003)
http://www.joelonsoftware.com/articles/Unicode.html

Presto

Inmagic Presto supports Unicode, and defaults to UTF-8 encoding. Yay!

 

Month List