Monday, February 18, 2008 2:34 PM
In a recent article by Chris Maunder of www.codeproject.com, he points out another essential reason to run a validator on your site: Google indexing. (Yes, you might notice that our out-of-date current Andornot site fails...a complete rewrite utilizing Umbraco is actually nearing the finish line - and it does validate). You can read more in the original article (in fact, please do so), but basically, setting the wrong DOCTYPE led to Google reporting the page as not found:
The problem we faced was that we had specified HTML 4.01 in our DOCTYPE but were trying to use XHTML style tags (i.e. self closing meta tags). IE had no problem with it. Firefox, Opera and even my blackberry had no problem with it (or if they did have a problem they were polite enough not to say anything). Yahoo didn't have a problem.
But Google did.
Google saw the DOCTYPE as being HTML 4.01. It then saw meta tags with a trailing "/>". It became scared and confused and decided that the only thing to do was report the page as not found.
Our custom error page had no meta tags so it was fine. Our article about templates had <> in the title which caused Googlebot enough confusion that it forgot about the self-closing meta tags and indexed the article.
We removed the "/"s from the meta tags and within 24hrs we were reindexed.
There were several other issues in addition to the above that they had to remedy as well, but again, to quote Chris:
W3C has HTML validators that we knew we failed but there was always a feeling of "so what?" The site rendered perfectly fine on the browsers we had access to. Why should this cause a problem?
As the Code Project found out first hand, paying attention to validation errors helps catch so many problems and potential pitfalls. It's not the be-all-and-end-all, but if you mistakenly think validation is just a theoretical exercise with no real-world implications, think again.