Re: Regulating PUA.

Date: Wed Jan 24 2007 - 16:02:39 CST

  • Next message: Richard Wordingham: "Re: Regulating PUA."

    Thank-you Philippe for a very insightful reply, I think I should frame
    it and put it on my wall. Then "some day" ...

    Quoting Philippe Verdy <>:

    > From: <>
    >> Unicode has consistently rejected using this approach of putting two
    >> Chinese characters together to make a new one, and insists each new
    >> CJKV character must be encoded, even though this would cut down the
    >> number of codepionts required dramatically. Most Chinese characters
    >> are in fact made in this way (over 80% if the one allows combinations
    >> of combinations).
    > I must ackowledge that this design choice, where the character model
    > was tweaked horribly to match the desires of existing and past
    > vendors, is somewhat flawed, and then it's difficult to understand
    > the position of the UTC and ISO WG2 regarding other scripts that are
    > horribly more complicate to implement and disavantaged (Hebrew,
    > Indic scripts) because, on the opposite, a much stricter character
    > model was chosen for them.
    > Some choices like this inthe character model (Thaļ visible ordering,
    > Hangul syllables...) at UTC (and at ISO WG2) are clearly
    > inconsistant and were guided only to support legacy applications
    > without any adaptation, but clearly against the encoding policy, but
    > are now perceived as severely limitating or devastating for the
    > evolution of the standard (and it is now a severe problem for rare
    > scripts that are still not encoded, and that will be difficult to
    > have them widely supported in implementations).
    > This is something that, some day, will block the evolutions and put
    > an end to the standard, so it places a complete industry to the risk
    > of a future major switch to a new standard with necessarily
    > incompatibilities and lots of costs for the future migration.
    > Regarding Han, the current desire to keep ideographs encoded at the
    > glyph square level only will not be maintainable (and consistancy
    > problems have already occured, with multiple encondings of the same
    > square), simply because the composition of these ideograph squares
    > was not documented.
    > It was said that ideographs do not compose easily into squares. This
    > may be true for some wellknown blocks, but I think this is not
    > really the rule. So these exceptions could have been handled like
    > ligatures. If Han had been consistantly encoded, it would have
    > priviledged the decomposed model based on radicals.
    > In the same spirit, it would have been enough to encode Hangul just
    > with base jamos (like they are learnt at school), using only a
    > single syllable break character were needed to makethe distinction
    > between final and leading consonnants and reasonnable default rules
    > for the position of these composed syllable breaks. The whole Hangul
    > script would have been encodable like a regular alphabet, something
    > that was forgotten but that it really IS: Unicode and ISO have
    > unnecessarily complicated hat was really a very simple script, and
    > have wasted tens of thousands of positions in the BMP just for
    > Hangul... instead of documenting a basic composition model which,
    > for Hangul, is in fact very simple and extremely regular.

    This message sent through Virus Free Email

    This archive was generated by hypermail 2.1.5 : Wed Jan 24 2007 - 16:04:43 CST