Re: Charsets + encoding + codesets

From: Kenneth Whistler (
Date: Wed Oct 08 1997 - 13:12:52 EDT

Martin asked, in response to my example of what I would
like to see in a consistent registry of encoded character

> Where did you get your short tags from? The largest and most widely
> used collection of tags in this area is the IANA "charset" registry.
> at least three of four of your short tags are wrong in this respect;
> it is iso-8859-1, utf-8, and utf-16.

I made them up. The whole point was to have a *short*, consistently
constructed tag for identification purposes within this table,
rather than that IANA tag, for several reasons:

  1. This is an excerpt from a large spreadsheet of such things,
     and many entries in the table do not have an IANA registry--
     thus do not have an IANA tag.

  2. The IANA tags are whatever they got registered as, which means
     they are not consistently generated, and are not always short.
     (My fave is: "Extended_UNIX_Code_Packed_Format_for_Japanese",
     but I also dislike the years appended to all the 8859 parts:
     "ISO_8859-9:1989", etc. MIME substitutes "ISO-8859-9".)

Think of the short tags as another set of aliases for the IANA
registry, if you will.

I am not suggesting that anybody start using yet another set of tags
in a context where the IANA "charset" tags are specified. By all means,
if using a standard that refers to the IANA "charset" values, use the
correct tags as defined there.


This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:37 EDT