Re: Fwd: Wired 4.09 p. 130: Lost in Translation

From: Mark Davis (
Date: Thu Aug 29 1996 - 06:45:13 EDT

We did not originally want to have two equivalent representations, but
for better interworking with existing sets we ended up adding the
precomposed forms. However, the standard clearly defines which sequences
are equivalent, and how to deal with them.

(BTW In our experience, it is far easier with modern GUIs to deal with
70 or so letters and 30 or so accents than it is with the explosion of
single characters representing the possible combinations.)

About your claim vis-a-vis "i"; it is wrong. This was clarified in the
standard in 1992 (see TSR#4). If you don't have a copy of TSR#4, then
you can wait a few weeks and look at the Unicode 2.0 book. That also
contains a lot more information on implementation guidelines, which
should help to show how to deal with some of the other issues that arise
with combining characters.

> Well, it thing one of the major flaws of Unicode is allowing a chacter that
> has a singel 16-bit code to also be represented by combining characters.
> Better to use ISO 10646 with level 2.
> Programming would have been much simpler.
> You who have implemented Unicode routines, do you
> recognise that "i" with code 0x69 also could be defined by 0x0131 0x0307 ?
> Dan

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:31 EDT