Love thy neighbour use UTF-8

I run UK based websites which have international reach.
Specifically a travel essentials site with more than 5k pages indexed at G sees exposure to Europe and its various languages.
A while back I allowed to start showing non UK advertising on this site.
To my surprise there was quite a take up by Germans Polish sites and further not so common languages.

They do have one thing in common they use special characters like umlaut accent super scripted etc.
The standard
<meta http-equiv=Content-Type content=\”text/html; charset=iso-8859-1\”>
is limited in it\’s display capabilities and usage of these.

Digging around a bit and testing I now switches the site to use UTF-8
<meta http-eqiv=\”Content-Type\” content=\”text/html; charset=utf-8\”>
No it does not use more bandwidth but there are some things worth noticing

Here is some blurb about coding and bandwidth implication

With UTF-16, all characters—at least, all the ones you\’re likely to use—require 2 octets, so the phrase \”Hello, world!\” will use 26 octets to store 13 characters. This is only required if one uses complex characters like Chinese and will have obvious bandwidth implications.

With UTF-8, basic ASCII characters require only 1 octet, so \”Hello, world!\” would use only 13 octets to store 13 characters; however, Cyrillic or accented European characters require 2 octets each, and other characters for other languages can require up to 6 octets.

With the ISO-8859 character encoding, each character requires only 1 octet, even accented European characters, but any characters not in the set require html/XML character references, such as м.

so utf-8 instead of ü is probably more efficient

The placement of the Meta tag is important

<meta http-equiv=\”Content-Type\” content=\”text/html; charset=utf-8\”>

But that meta tag really has to be the very first thing in the r sees this tag it\’s going to stop parsing the page and start over after reinterpreting the whole page using the encoding you specified.

So one doesnt want a head interpreted only to be run again….

Also using UK based systems and standards as well as an older version of DW to manipulate Dreamweaver I use a blank template to generate new pages (if ever).

If one uses special characters that are not on the keyboard it is advisable to check the UTF coding and use this instead of country standards. I found this out the hard way when using superscript degree which is coded differently in ISO than in UTF.

Otherwise I\’m pleased that umlaute and other specials now seem to show correctly.

I\’m eager to see if now these are slurped up correctly by the SE\’s and will keep you posted


Leave a Reply

Your email address will not be published. Required fields are marked *