Re: Three modest proposals

From: Asmus Freytag (asmusf@ix.netcom.com)
Date: Sun Apr 03 2011 - 23:42:16 CDT

  • Next message: Julian Bradfield: "Re: Three modest proposals"

    On 4/3/2011 3:32 PM, Peter Constable wrote:
    > From: unicode-bounce@unicode.org [mailto:unicode-bounce@unicode.org] On Behalf Of Michael Everson
    >
    >>> In case not, I consider encoding of characters for hatches to be an extremely bad idea: these are not characters but graphic fills, of which there are a vast number.
    >> They are merely an extension of a small set of such fills, which have
    >> been standardized for centuries. It is a mistake to suggest that this is
    >> an endless set. The specific fill patterns and their meanings are in fact
    >> well-defined within a number of traditions, as shown in the proposal.
    > The existing characters came from an age of non-graphic / character-only computer displays and are encoded purely for legacy reasons, not because it today's world of graphical user interfaces it's a good idea to encode them as graphic characters. In WG2, we recently saw a preliminary proposal from China in which they proposed some line-drawing characters, and it was explained to them why the existing characters are encoded and why adding new characters was not a good idea, and they understood that explanation and agreed....
    >

    It's clear from this exchange that not all participants agree on the
    status of the existing characters and what that means for using them as
    a precedent for additional encoding.

    Therefore, this requires a bit longer reply, so bear with me.

    In understanding this argument, it really helps being familiar with
    character display before the advent of graphical user interfaces. And of
    course, having seen and worked with the original implementations of
    these Asian legacy sets also doesn't hurt.

    There are literally hundreds of characters encoded in Unicode which
    exist solely for the purpose of allowing a Unicode based implementation
    to 'pretend' to be a legacy implementation by mapping oddball legacy
    character codes to unique Unicode values.

    When the transition was first made from using the legacy character sets
    themselves to their "virtual" representation using Unicode and lossless
    character code mapping, the presence of these compatibility characters
    was vital.

    Now that technology has moved on, it is doubtful that these characters
    are even used any longer, because the legacy implementations downstream
    with their character mode displays were probably long replaced by
    graphical UIs implemented directly in Unicode.

    This does not mean that such characters should be deprecated - it's
    simply unknown whether they are being used somewhere or whether legacy
    data exists that contains them. However, it does mean that their
    presence is no longer a "productive" precedent.

    What is a non-productive precedent?

    A non-productive precedent is one where for historical and other reasons
    some characters, while still present in Unicode are considered
    "mistakes" or "exceptions", and therefore, there is broad support for
    the idea of not adding more characters like them.

    So why isn't there a list of characters that are not precedents?

    There are several reasons. Many characters were considered to be encoded
    for "compatibility" when Unicode was created (or when they were added
    later to Unicode). Over time, some of these characters were then found
    to have very active use in certain environments or to be
    indistinguishable from characters that would have to be otherwise
    encoded again to cover certain needs.

    Also, in a process of discovery over the two decades of Unicode's
    existence, the exact boundaries implied by the character-glyph model
    have had to be adjusted (in a limited number of cases) to better match
    the rather messy real world of character usage.

    Essentially, this boils down to the fact that whether a character is a
    suitable precedent for adding other, similar, characters is not a
    question of black and white, but can take on any value from "definitely
    not" to "very definitely".

    In this case, the situation is that the KSC X1001 characters are
    "definitely not" precedents. If it can be shown that these characters
    have found other uses, that are unrelated to their identity as KSC
    compatibility characters, then that would affect their value as precedents.

    For that to be shown, evidence needs to exist of documents /
    implementations that use these as Unicode characters. (Not merely
    similar looking graphic elements in text).

    The proposal does not give such evidence, therefore, for these
    characters, the default reply for proposals to extend the set would
    remain: not to extend this set, because these characters are "definitely
    not" considered precedents (given their particular encoding history).
    The statement "these characters are a modest extension to an existing
    set" is therefore misleading - the existing characters are of such
    different origin and different use that they cannot be considered a
    partial encoding of the proposed characters.

    Secondly, the proposal doesn't give evidence that any of the characters
    are used in plain-text today or that such usage is urgently needed.
    (Aside: I think Unicode and WG2 should formally recognize that "urgently
    needed" has been a valid reason for encoding characters for a while now,
    and not only for currency symbols - the claim that only "existing use"
    is a valid rationale is not backed by the facts).

    If evidence of use as *characters* or the presence of "urgent needs" to
    a significant user community could be determined, then that would be
    sufficient to consider the entire set of characters on their merits. The
    unifications with existing characters would then be made as appropriate.

    A./



    This archive was generated by hypermail 2.1.5 : Sun Apr 03 2011 - 23:47:42 CDT