Andornot Consulting Inc.
Home Page
Home Page
 |  | 

Thursday, December 20, 2007

Could Google index a Webpublisher database?

I hear people ask this question a lot: "Can Google index my Inmagic Webpublisher database?" And so far, the answer has been a disappointing "nope". Database records are only reachable via complicated URLs with long query strings, and search engine policy is to ignore such URLs.

But I had a thought today while listening to a .NET Rocks! podcast about the recent ASP.NET implementation of the MVC (model-view-controller) pattern, and how one of its big strengths is the ability to work with predictable and logical URLS that don't have to correspond to a file on disk, negating the need for clever URL rewriting. Jeff Palermo, the expert interviewee on the subject, cited an example URL an online retailer might use to make its widget products indexable by search engines:

/widgetCategory/widget1/red

And I thought, why not do this for Webpublisher? Not put the MVC pattern into practice necessarily, but use URLs like that to produce Webpublisher query results. With clever URL rewriting. (I.e. handle the URL behind the scenes server-side, not merely *redirect*.)

For example, /catalog/title/the_golden_compass could search for "Title = 'The Golden Compass' in the Catalog database." The path would contain the logic required to construct the query. More examples:

/catalog/author/smith 

/catalog/recordid/123

/catalog/datemodified/2008-01-01

One would then need to produce some kind of site map that gave Google the links to follow.

I bet this would work. I'll try it in the next few weeks and report back.

4 Comments:

Anonymous Anonymous said...

We've discovered that Google indexes our databases if we have canned queries on our web pages. It seems to be able to handle those long URLs just fine. But it would be cool if it could index stuff not found via our canned queries too, so I'm eager to hear more about your idea, Peter!

11:21 AM  
Blogger Peter Tyrrell said...

Huh. I admit I am surprised to hear that Google follows the links with heavy query strings. What does the google hit look like? I bet we could at least improve its readability, eh?

11:44 AM  
Blogger Jeffrey Palermo said...

I don't see any reason that would not work. With the MVC Framework, urls are first-class citizens, whereas now, you would have to handle all the url rewriting yourself.

6:43 PM  
Anonymous Dave C said...

You can just create an xml sitemap of anything you want indexed.
Google describe the format here:
https://www.google.com/webmasters/tools/docs/en/protocol.html

No more difficult to crank out that xml than it is to generate rss xml from your data, indeed if you already generate feeds of data then just throw together some XSLT to produce a sitemap.

7:33 AM  

Post a Comment

<< Home