Re: U+xxxx, U-xxxxxx, and the basics

From: Paul Keinanen (keinanen@sci.fi)
Date: Sat Mar 04 2000 - 06:16:27 EST


On Fri, 3 Mar 2000 16:57:21 -0800 (PST), Mike Brown
<mbrown@corp.webb.net> wrote:

>Aside from the Universal Character Set shared by the Unicode Standard and
>ISO 10646-1, other popular coded character sets include US-ASCII (128
>abstract characters mapped to scalar values in the range 0x0..0x7F) and
>ISO-8859-1 (US-ASCII plus another 96 abstract characters mapped to scalar
>values in the range 0xA0..0xFF).

This is a bit inconsistent, since in US-ASCII the control characters
0x00 .. 0x1F (and the DEL "control character") are included in the
count, but the "8-bit controls" (IND, CSI, DCS etc.) are not included
in the count for ISO-8859-1.

It would be more consistent to either talk about the printable
94/95/96 characters 0x20 (0x21) .. (0x7E) 0x7F (take your pick) for
US-ASCII _or_ talk about 128 additional (0x80 .. 0xFF) characters for
ISO-8859-1.

>Here are 3 ways of representing the Unicode scalar value of the abstract
>character named "ANGSTROM SIGN":
> * in a hexadecimal notation: 0x212B

Has this 0x notation been previously defined to mean hexadecimal
notation ?

There are a large number of ways of representing hexadecimal numbers,
such as 212B, $212B, 212Bh etc., so unless a convention has been
previously adopted, it would be more accurate to talk about C-language
hex notation.

Paul



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:59 EDT