Re: 2 dumb questions: Plane 14 and codepages

From: Markus Scherer (markus.scherer@jtcsv.com)
Date: Fri Jun 30 2000 - 17:17:54 EDT


Mike Newhall wrote:
> 1. What are "plane 14 language tags"?

they are a mirror set of the ascii graphic characters to what iso 10646 calls plane 14. plane 14 means for unicoders that the code points are from U+e0000 to U+effff - note that hex 'e' is decimal 14. these characters are used for tagging when you don't have markup or out-of-band information, and are intended for language tags like "de-AT". the ietf wanted such in-band info, while most unicoders don't like them...

> 2. Just out of historical curiosity, what is a codepage / how did the name
> / numbers originate? I have over the years used the inferred definition
> that it is an 8-bit character set selected by a number, but...

it can be double-byte, mixed-byte, etc., up to four bytes/char.

every company and standards organization basically mapped their favorite sets of characters to their favorite byte combinations, and each such mapping or association is called a codepage. i am not sure how to compare it with modern terminology, but i believe that most people see it equivalent to "charset" or "character encoding form/scheme", sometimes only "coded character set" (see rfc's and the unicode tech report about the character encoding model).

> - Is this really the complete and accurate definition of what a 'codepage'
> is?
> - Are these #'s always OS-specific, or sometimes standardized?

ibm has a list, microsoft has one, apple has one, the iana list has both names and "MIB enums", MIME has a list, ...
sometimes the same number means something more or less related, but without real coordination, and other times the same number is something entirely different. this basically makes it hard to exchange text in many of them.

unicode was created to be so well-defined and all-appropriate that we don't need "legacy" codepages any more, but the above institutions (except microsoft?) continue to create new ones...

> - Is there a rhyme or reason to the number assignments? It seems that
> they were not assigned in sequence, unless each OS has hundreds or
> thousands of code pages.

each organization has at least dozens, if not hundreds or thousands :-)
i don't know a particular reason for the number values. the ibm values need to fit into 16 bits with a few reserved values.

> - Where did the term originate? It seems to have a hardware flavor, as if
> an old piece of display hardware had selectable ROMed fonts.

i am guessing it is from printed manuals?

> Mike Newhall
> AltaVista

markus scherer
ibm (icu)



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:05 EDT