Re: diaeresis/umlaut

From: Peter_Constable@sil.org
Date: Sat Jun 12 1999 - 10:51:05 EDT


>The reason is quite simple, they look the same, they have almost always been
treated the same (except for one German character set, that was never widely
used), and most people will not be aware of their differences, or don't care, so
they will use one for the other, and effectively, we will just end up with two
characters that we will have to treat the same way. For the same reason we don't
distinguish between a decimal period and a normal period, etc.

The reason has to be a little stronger than this. They must

- look reasonably the same (if more than one appearance, then both are never
needed in a given context; e.g. Greek & Coptic, CJK),
- never be distinguished in existing encoding standards
- never have different behaviour that requires different treatment by any
process

The issues of whether they have different phonetic meanings or whether people
think of them as the same or different is not necessarily important. To take an
extremely obvious example, the phonetic value of the letter "i" is different
between Spanish "primer" and English "primer"; ditto for every other letter
except "m". But nobody is tempted to assign separate codepoints for these. The
reason is that Unicode is encoding elements of writing, not elements of speech.

Consider IPA characters: many of them have been unified with Latin characters.
The function of the letter "a" as an element of the English alphabet is quite
different from the function of the symbol "a" in IPA. But there is never any
situation where a process can't be made to work unless the two have separate
codepoints.

>I think there are other non-spacing characters (diacritics) that have the same
Unicode character code value but different meanings in different scripts. And
like Mr. Figge I begin to wonder why these two meanings are not treated
differently, like Latin A, Greek Alpha and Cyrillic A have different code
values. Maybe someone can clarify this.

I believe the main reason that these were kept separate is for round-trip
convertibility with existing standards.

For information on Unicode design principles, consult chapter 2 of the Standard.

Peter



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:46 EDT