Re: Default properties for PUA characters???

From: Mark Davis (mark.davis@jtcsv.com)
Date: Tue Dec 03 2002 - 13:23:48 EST

  • Next message: Rick McGowan: "Re: code points in MS word"

    Ken is correct: the default properties are somewhat different for ideographs
    than for PUAs. In addition, PUAs are a special case compared to other
    characters; implementations are free, within very broad limits, to change
    the default properties associated with a PUA code point to whatever is
    appropriate to whatever private-use character definition the application
    gives to that code point.

    In other words, an application, if it treats a particular PUA as an
    ideograph, is free to change the default properties to match Ken's list (and
    for other properties):

    gc=Lo (general category = Other_Letter)
    ccc=0 (combining class = 0, i.e. Not_Reordered)
    bc=L (bidi class = strong Left_To_Right)
    sc=Hani (script = Han)
    lb=ID (line break = Ideographic)
    ea=W (east asian width = Wide)

    If an application treated a particular PUA character as a Greek Linear B
    character, on the other hand, it would assign yet different properties.

    Now in practice, the vast majority of PUA characters in use are representing
    ideographs, mapped from East Asian standards. Due to this fact, *in the
    absence of other protocols establishing the precise usage of the PUA
    characters*, we have found that is generally best practice to interpret the
    PUA characters as ideographs. However, applications are free to interpret
    them however they want.

    Mark
    __________________________________
    http://www.macchiato.com
    ► “Eppur si muove” ◄

    ----- Original Message -----
    From: "John Cowan" <jcowan@reutershealth.com>
    To: <kenw@sybase.com>
    Cc: <wittern@kanji.zinbun.kyoto-u.ac.jp>; <unicode@unicode.org>
    Sent: Monday, December 02, 2002 21:08
    Subject: Re: Default properties for PUA characters???

    > Kenneth Whistler scripsit:
    >
    > > So I'd say that the XML Core WG has got the situation only
    > > partially correct for Unicode PUA characters.
    >
    > As the actual author of that Core WG text, mea culpa. But I was basing
    > my remarks on things said on this list.
    >
    > --
    > All Gaul is divided into three parts: the part John Cowan
    > that cooks with lard and goose fat, the part
    www.ccil.org/~cowan
    > that cooks with olive oil, and the part that
    www.reutershealth.com
    > cooks with butter. -- David Chessler
    jcowan@reutershealth.com
    >
    >



    This archive was generated by hypermail 2.1.5 : Tue Dec 03 2002 - 14:08:55 EST