Re: Fwd: Wired 4.09 p. 130: Lost in Translation

From: Dan.Oscarsson@trab.se
Date: Thu Aug 29 1996 - 04:28:21 EDT


> Michael Everson wrote:
> >Martin J Duerst wrote:
> >>Please be careful. To know whether an A is just only an A, you only have
> >>to check the next position. If that next position is not a combining
> >>character, you know it is an A, if it is a combining character, you
> >>know it is "something else".
> >
> >Yes, but it's not a once-off look, is it? Because you can stack combining
> >characters. So you know it's not an A, but you have to keep looking and
> >looking and looking, don't you? Doesn't this make processing much more
> >complex than Level 1 processing?
>
There is also an additional problem by having the combining character coming
after the non-combinging, instead of before, as when reading interactively
and a program want to act on a singel character it must use a timeout
waiting for more bytes to decied when no more combining characters will
arrive. If it had been defined that combining character came before the
non-combining, parsing would be simpler - the non-combining ends a
combining sequence.

> A truly "universal" character set can only be built on composition.
>
> As far as I can remember, Unicode accepted pre-composed characters as part of
> the great compromise with ISO 10646. It doesn't mean we have to think of them
as
> anything more than a pragmatic sanction.

Well, it thing one of the major flaws of Unicode is allowing a chacter that
has a singel 16-bit code to also be represented by combining characters.
Better to use ISO 10646 with level 2.
Programming would have been much simpler.
You who have implemented Unicode routines, do you
recognise that "i" with code 0x69 also could be defined by 0x0131 0x0307 ?

    Dan



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:31 EDT