Re: Fwd: Wired 4.09 p. 130: Lost in Translation

From: Mark Leisher (
Date: Thu Aug 29 1996 - 17:16:34 EDT

    Dan> There is also an additional problem by having the combining character
    Dan> coming after the non-combinging, instead of before, as when reading
    Dan> interactively and a program want to act on a singel character it must
    Dan> use a timeout waiting for more bytes to decied when no more combining
    Dan> characters will arrive. If it had been defined that combining
    Dan> character came before the non-combining, parsing would be simpler -
    Dan> the non-combining ends a combining sequence.

I have always been of the opinion that timeouts were poor solutions for
interactive input. But, I have seen them used to effectively improve input
speed a little. However, I am more than willing to sacrifice that capability
for more generality. Besides, I have not seen a situation where that kind of
input is needed for quite a while (with the exception of ftp and telnet);
GUI's have taken over the world.

    Dan> Well, it thing one of the major flaws of Unicode is allowing a
    Dan> chacter that has a singel 16-bit code to also be represented by
    Dan> combining characters. Better to use ISO 10646 with level 2.
    Dan> Programming would have been much simpler. You who have implemented
    Dan> Unicode routines, do you recognise that "i" with code 0x69 also could
    Dan> be defined by 0x0131 0x0307 ?

Flaw? We actually find it quite useful to be able to decompose characters.
Comes in very handy for morphological analysis. It might appear that some
decompositions are pointless (no pun intended), but they could be useful for
something you might not have thought of.
Mark Leisher "A designer knows he has achieved perfection
Computing Research Lab not when there is nothing left to add, but
New Mexico State University when there is nothing left to take away."
Box 30001, Dept. 3CRL -- Antoine de Saint-Exup'ery
Las Cruces, NM 88003

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:31 EDT