Re: [hebrew] Re: Aramaic unification and information retrieval

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Wed Dec 24 2003 - 19:18:15 EST

  • Next message: Mark E. Shoulson: "Re: Aramaic unification and information retrieval"

    "Michael Everson" <everson@evertype.com>
    > We have encoded 70,000 of them.

    All depends on the way you define characters. Most ideographs are composed,
    but Unicode and the CJK unification working groups have failed for now to
    define a coherent definition of how these characters really compose, so we
    are still assisting to an always exploding number of compound ideographs,
    created everyday by Han users.

    If Latin characters were counted the way Han is, we would probably reach
    similar (may be even more) composed "characters". It's just infortunate that
    Han lacks a way to describe its composition model (it used to be the case
    too for the Hangul Alphabet, but recent works seem to demonstrate that the
    complexity of Hangul is just superficial in Unicode but forgets the actual
    use and rules that are inherent to the script script).

    I'm sad to say that I really think that the Unicode character model is very
    weak except for LTR alphabet scripts like Latin, Greek and Cyrillic... And
    this also affects then the W3C character model as well. New concepts are
    needed to correctly handle the actual properties of languages used by
    billions of people that are not used to the English language, and the
    Unicode formalism and work methods.



    This archive was generated by hypermail 2.1.5 : Wed Dec 24 2003 - 20:00:24 EST