RE: An attempt to focus the PUA discussion [long]

From: Ernest Cline (ernestcline@mindspring.com)
Date: Thu Apr 29 2004 - 16:18:02 EDT

  • Next message: John Hudson: "Re: New contribution"

    > ----- Original Message by Rich Gillam-----
    > ...
    >
    > Seems to me that the choice of defaults was designed to irritate the
    > smallest number of people possible and cover the widest range of
    > use cases possible, and that we're now hearing from people in that
    > "smallest possible" group.
    >
    > Those people have legitimate needs. How should they be
    > accommodated, and how does Unicode participate in that process?
    > Seems there are a number of options:
    >
    > 1) Change the default properties for some range of the PUA. This is
    > what people seem to be pushing most hard.

    Actually, no. While it seems like an obvious first solution, the problems
    you pointed out are quickly pointed out to such people so that they start
    to push for option 2)

    > 2) Leave the current PUA alone, but set aside a new PUA, say Planes
    > 12 and 13. This solves the existing-use problem, but you still have the
    > question of just how you subdivide the range, and it starts to cut down
    > significantly on the code points available for actual standardization.

    It won't require so many code points. I have been working on such
    a proposal, and it requires not even a fifth of a plane, let alone two
    full planes. I even hope to make it smaller.

    > 3) Define ad-hoc standards that are based on Unicode but make
    > character assignments in the PUA and lobby application vendors
    > to support these encodings in addition to regular Unicode.

    I just can't see application or OS vendors choosing to pick a
    single PUA standard that is different from the Unicode defaults.

    > 4) Lobby for operating-system vendors to extend their text engines
    > to allow properties of PUA code points to be configured.

    While this would be a possible solution, it has some real drawbacks
    as well. especially when different Private Use assignments overlap
    in the codepoints they use and have character properties that
    differ for the same character.

    > 5) Write specialized applications that are designed to deal with
    > certain scripts and address the needs of user communities
    > whose needs aren't being met right now.

    The problem is, for most potential private uses, if there is sufficient
    interest that a specialized application gets written just to handle that
    one private use, it probably has enough interest to merit being
    encoded in Unicode itself.

    > 6) Use markup or other fancy-text mechanisms to override the
    > default properties. There are plenty of controls for controlling
    > directionality, cursive joining, and line breaking. It may be
    > inconvenient to use them, but it seems like a viable workaround
    > while waiting for something to get into Unicode, and there's no
    > implementation lag. What problems do the existing mechanisms
    > not solve? Maybe the discussion should focus on this question--
    > are there mechanisms that should be added to Unicode or some
    > markup language to help enable some of these scripts?

    Markup already uses the selection of a font to establish which set
    of Private Use characters is in use. Bidi Class and Line Break can
    be handled, if not elegantly via the existing codes. However, the
    behaviors that cause the most interest in a better defined private
    use area, combing marks and cased letters cannot be handled
    in a generic manner by formatting characters. It simply is impossible
    to simulate non-zero canonical combining class characters in Unicode
    with anything other than a character with the appropriate canonical
    combining class. Indicating the case would involve having a means
    to indicate which case the character is and where its other case is

    > 7) Design custom fonts that cannibalize existing code points that
    > have the right sets of properties.

    This is a commonly chosen option right now, only it is usually done
    with respect to legacy encodings so as to make use of the keyboard
    mapping that is commonly associated with that legacy encoding.
    It's not pretty, but most private users look for something that will make
    their private use easy to accomplish, despite the problems it causes.

    Any solution which requires users of the Private Use scheme to do
    more than install a few files on their system to get it to work will
    probably not be used, since schemes such as that embodied by
    option 7) work now, despite the trouble they cause for others not
    using the scheme.



    This archive was generated by hypermail 2.1.5 : Thu Apr 29 2004 - 17:01:10 EDT