Inmagic Webpublisher and character encoding

by Peter Tyrrell Wednesday, March 04, 2009 1:40 PM

There is a lot of potentially confusing information when speaking of character sets and encodings, so here's how it all relates to Inmagic DB/Text and Webpublisher, according to my understanding.

Inmagic DB/Text supports the western Latin character set, which covers the characters used in Western European languages. (All the characters you would find in ANSI/ASCII.) It does not support Unicode, which is the Ultimate Character Set For Every Character Ever Conceived, Even Klingon.

Inmagic Webpublisher supports two types of character encoding: ISO-8859-1 and UTF-8. Encodings are computer representations of character sets. ISO-8859-1 encodes just the western Latin character set, while UTF-8 encodes the western Latin character set plus the rest of Unicode. Thus you can ask Webpublisher to encode DB/Text data output to the browser in either.

The de facto standard for web pages these days is UTF-8 encoding, for the obvious reason that it supports the widest possible range of characters. The browser is dependent on the web page itself to tell it what encoding is acceptable. If the web page is remiss in sending encoding instructions, then the browser is forced to fall back to a default position, which may differ from browser to browser. (Especially old browsers.) But that would only be the case for a web page or website that is not well designed! A good design completely controls the encoding: from the encoding a web page file is saved in, to the content-type header the HTTP server emits, which must match! A meta tag that sets content-type only matters in offline viewing.

Webpublisher serves up data dynamically, so there is no web page file to worry about. However, it does by default encode with ISO-8859-1, and one must explicitly tell it to encode with UTF-8 if that is desired.

Query string parameters that govern encoding

OEH=UTF-8 (cars query example)
This query string parameter tells Webpublisher to emit HTML encoded in UTF-8. Sure enough, Webpublisher adds a content-type header set to "text/html; charset=UTF-8". Interestingly, Webpublisher does not set any content-type header by default for HTML, so it's best to always supply an OEH value.

OEX=UTF-8 (cars query example)
This query string parameter tells Webpublisher to emit XML encoded in UTF-8. Webpublisher correctly sets the encoding attribute on the xml tag to <?xml encoding="UTF-8" ?>. However, it also adds a content-type header set to "text/xml; charset=ISO-8859-1" on every XML response, regardless of OEX value, which appears to be incorrect.

Further reading:

The Definitive Guide to Web Character Encoding by Tommy Olsson (2007)

The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) by Joel Spolsky (2003)


Inmagic Presto supports Unicode, and defaults to UTF-8 encoding. Yay!


blog comments powered by Disqus

Month List