Re: What is the principle?

From: Ernest Cline (
Date: Mon Mar 29 2004 - 18:20:32 EST

  • Next message: Kenneth Whistler: "Re: Printing and Displaying Dependent Vowels"

    > [Original Message]
    > From: Kenneth Whistler <>
    > To: <>
    > Cc: <>; <>
    > Date: 3/29/2004 2:28:25 PM
    > Subject: Re: What is the principle?
    > Ernest Cline stated:
    > > The standard is quite clear that if a Variation Selector is recognized,
    > > not
    > > the sequence it is, then it should be treated the same as if no
    selector was
    > > present.
    > Which is true.
    > >
    > > This is one reason why transferring some or all of the Variation
    > > on the SSP to Private Use is a possibility if they are not going to have
    > > any official uses.
    > This, however, is distinctly inadvisable, for several reasons.
    > First, the 240 Variation Selector characters on Plane 14 were added
    > *explicitly* to deal with Han variation issues, which involve
    > many, many more possible variants, in some cases, than the
    > typical numerosity for the occasional variants notes in other
    > scripts.

    Well, I said that was only a possibility if they weren't. Since they
    are planned to have some, then it would be reasonable to retain them
    for that use, altho it does seem a bit strange that they were assigned
    before they were needed for official uses and that they were assigned
    in a manner that leaves an empty row at U+E01F0 to U+E01FF when
    there apparently wasn't a scheme being planned that would need
    exactly 256 variation selectors.

    > Second, the UTC is considering a scheme for dealing with existing
    > large collections of Han variants by expliciting dedicating 128
    > of those 240 to a preexisting glyph variant registration scheme,
    > to move the Han variation problem off dead center (given that the
    > task of spelling out exactly what *are* the variants is an enormous
    > problem for Han).

    Is there a pointer you could provide for this glyph variant registration

    > Third, the proposal to "transfer ... some or all of the Variation
    > Selectors on the SSP to Private Use" is unclear on the concept of
    > Private Use. The UTC will make *no* semantic encoding commitment
    > regarding what a private use character is to be used for. That would
    > include *not* specifying that some range of Private Use characters
    > be dedicated to use as variation selectors (privately defined).
    > Anyone who wanted to put in place their own private Idaho of
    > two-character encoding for Mende or whatever, could simply define
    > that private use space as they wish. Of course they cannot then
    > expect automatic rendering (or other) support from standard OS
    > interfaces, but that is the fundamental nature of Private Use
    > characters.
    > Essentially what you seem to be asking for is for the UTC to
    > relax the restriction of definition of *variation sequences* --
    > i.e. let some of the variation selectors be used on an ad hoc
    > basis by consenting adults. But that was *explicitly* ruled out
    > by the UTC as a potential barrier to interoperability and because
    > it would be an invitation to chaotic glyph encoding.

    For Variation Selectors that are used or contemplated
    for official sequences, I agree that they should not be used
    for ad hoc sequences. What I was asking for there to be
    Private Variation Selectors whose private use would not
    interfere with official variation sequences, either those that
    are or might be assigned in the future. Since it appeared
    that there were many more variation selectors than would
    likely be needed for official variation sequences,
    transferring some of those to being used ONLY for private
    variation sequences seemed to be a possibility that would
    make use of the excess selectors instead of adding new
    characters that would have the same function. Since an
    official use is contemplated for the existing variation
    selectors, then a transfer to private use is not desirable.

    However, if I am understanding you correctly, when it comes
    to the idea of even new Private Use only Variation Selectors,
    too bad. If Private Use characters don't have the default
    characteristics, Unicode intentionally makes using them
    as difficult as possible. You consider this a boon, I feel
    this is a fundamental flaw.

    Unicode would benefit from having ranges of Private Use
    characters that would be known to have certain character
    properties, such as being a Variation Selector, or to take
    a topic from a recent thread, if there were Private Use
    characters with a default strong RTL property for the
    Bidirectional Algorithm. I can appreciate the desire to
    further interoperability, but I don't think that unhelpfulness
    on the part of Unicode towards such uses helps achieve
    interoperability. It merely moves the interoperability problem
    to a different place rather than doing anything to solve it,
    and does so in a manner that impairs usability.

    The chaotic glyph encoding you fear is exactly why
    once official characters for private use characters
    (of any variety) are established and supported, new
    documents tend to use the official characters along with
    pre-existing documents being migrated to the official
    version, just as now happens with the various ad-hoc
    8-bit pseudo-encodings when a version of Unicode
    that supports its characters becomes available I fail
    to see why supporting robust private use characters
    would either impair the adoption rate of official characters,
    or cause a delay in the addition of new scripts or
    characters to the official Unicode registry. Indeed, one
    could argue that by making it easier to experiment with
    private use characters, one might see faster adoption
    of new scripts and characters with something other
    than the default characteristics now assigned to all
    private use characters, as it would be easier to test the
    utility of such scripts and characters.

    This archive was generated by hypermail 2.1.5 : Mon Mar 29 2004 - 19:10:23 EST