Re: Mixed up priorities

From: Ashley Yakeley (ashley@semantic.org)
Date: Fri Oct 22 1999 - 04:40:29 EDT


At 1999-10-21 20:50, G. Adam Stanislav wrote:

>Yes, it is possible to encode the CH as the C followed by the H, and the N
>caron by the N followed by some connection code followed by a caron.

Actually, N is encoded with as a codepoint for N followed by a codepoint
for 'combining caron'. These 'combining codepoints' modify the character
suggested by the previous codepoint, more or less.

>And it
>is perfectly possible for software to handle it. But that would not be
>CHARACTER encoding.

Unicode is intended to encode text as character-streams, rather than
glyphs, but it certainly does not in general encode one character per
codepoint.

N-caron is a character.
Unicode encodes characters.
Unicode encodes N-caron as a sequence of two codepoints.

Now some characters, such as 'M', can be encoded using one codepoint.
Some, such as 'à' (a-grave), can be encoded in several ways.

>Unicode clearly states its goal to be the encoding of
>characters of all languages, existing and defunct.

Correct.

>CH is a character is in Slovak.

Correct. Unicode encodes that Slovak character as U+0043 U+0048.

-- 
Ashley Yakeley, Seattle WA



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:54 EDT