RE: Level of Unicode support required for various languages

From: Philippe Verdy (
Date: Tue Oct 30 2007 - 06:32:25 CST

  • Next message: "RE: Level of Unicode support required for various languages" wrote:
    > > The intent is to allow systems to represent IDSs using single glyphs,
    > > if they can and choose to do so, either through on-the-fly composition
    > > (which will almost certainly be pretty ugly) or through the ligature
    > > mechanisms available in smart fonts. The latter is more likely. In
    > > this case someone with a need to represent a particular unencoded
    > > character (or a set of such) could use a custom font to, at least, make
    > > their text look decent.
    > >
    > The intent would seem to allow for the representation through smart fonts.

    I don't think so. For me the encoded IDC are not different from symbols, or
    from mathematical operators.
    So trying to display an IDS differently would be exactly the same kind of
    process as transforming, when rendering, the mathematical operation
    "a*(x+y)" into "a*x+a*y".

    This is not intended, because the operator semantics of the encoded IDC
    characters is NOT defined, and there are several competing usage of these
    IDC characters within several IDS expressions, each with their specific
    semantic and syntax.

    For me they are just encoded for being able to encode the expressions and
    display them linearly, exactly like the "+" and "*" mathematical operators,
    that also don't have semantics by themselves in expressions.

    For example the same mathematical expression semantic could be encoded as "a
    * (x + y)" or "a x y + *" or "* a + x y" or "(* a (+ x y))" or "*(a, +(x,
    y))"... and the same kind of alternative syntaxes (using the same encoded
    IDC symbols as operators) are permitted and actively used in several IDS

    In other words, the IDS are not creating larger grapheme clusters, each IDC
    is its own grapheme cluster, self-delimited and completely independent from
    the surrounding.

    You could even use the encoded IDC for something else than Han (for example
    between Latin characters, may be needed for notation purposes, where some
    radicals are replaced by placeholder variables).

    To interpret any IDS containing any number of IDC characters and any other
    characters, you need an (unencoded) higher-order protocol that defines its
    semantic (for example by embedding an IDS within an "interpreter" object,
    but this higher-order object is impossible to specify in plain-text; it may
    be possible in XML-like formats at the processing level, but NOT at the
    parsing or even at the schema-validating level).

    This archive was generated by hypermail 2.1.5 : Tue Oct 30 2007 - 10:03:34 CST