Re: Mixed up priorities

From: Christopher John Fynn (cfynn@dircon.co.uk)
Date: Fri Oct 22 1999 - 05:29:40 EDT


Ashley Yakeley wrote:

> Unicode is intended to encode text as character-streams, rather than
> glyphs, but it certainly does not in general encode one character per
> codepoint.

What? Is that official.

> N-caron is a character.
> Unicode encodes characters.
> Unicode encodes N-caron as a sequence of two codepoints.

> Now some characters, such as 'M', can be encoded using one codepoint.
> Some, such as 'à' (a-grave), can be encoded in several ways.

> > Unicode clearly states its goal to be the encoding of
>> characters of all languages, existing and defunct.

> Correct.

>> CH is a character is in Slovak.

> Correct. Unicode encodes that Slovak character as U+0043 U+0048.

So you accept Stanislav's argument that CH is a *character* in the Slovak
and Czech
languages but you introduce a doctrine that "Unicode ... does not in general
encode
one character per codepoint". Where does the standard state this?

If the Slovak government were to decree that this "character" CH which
appears between
"H" and "I" in their dictionaries was in future to be represented by some
entirely new
unitary glyph would it then somehow become an acceptable candidate for
encoding
- or would it still have to be encoded by the sequence U+0043 U+0048? After
all
this would only be a glyph change.

If it is accepted that CH is a character in the Slovak and Czech languages
then
it seems to me the real reason for not encoding a CZECH-SLOVAK LETTER CH
is for backwards compatibility with legacy systems or problems that would be
caused by people using it as a ligature for CH in other languages

- Chris
(or would that be CHris?)



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:54 EDT