Re: Request clarification on disunification based on different character properties

From: Asmus Freytag (asmusf@ix.netcom.com)
Date: Mon Sep 07 2009 - 18:59:10 CDT

  • Next message: verdy_p: "Re: What justification for separately encoding two forms of lowercase  sigma"

    On 9/7/2009 5:50 AM, verdy_p wrote:
    >> De : "Shriramana Sharma"
    >> A : "unicode@unicode.org"
    >> Copie à :
    >> Objet : Request clarification on disunification based on different character properties
    >>
    >>
    >> Hello. Again the disunification question. P 29 of the P&P document:
    >>
    >> If a character disunification cannot be achieved by adding one
    >> new character without requiring a change in very significant properties
    >> of the existing character and without changing the representative glyph
    >> or range of expected glyphs for the existing character, then new
    >> characters will be added for each of the distinct, specific letterforms
    >> required.
    >>
    >>
    >>
    To that end Philippe proposes:
    > "If a character unification cannot be maintained without changing very significant properties of the existing
    > character and without changing the representative glyph or range of expected glyphs for the existing character, then
    > new characters will be added for each of the distinct, specific letterforms required."
    Which is an entirely different statement.

    First, the change in context from "dis-"unification (an event, triggered
    by a proposal) to "maintaining unification" (s state).

    In the P&P, the context is always that of a proposal that has been
    submitted that would ask for some change in encoding. In this case, it
    would ask for a new character to cover some textual entity that was,
    heretofore, encoded with an existing character. The standard example of
    that situation is the character "HYPHEN-MINUS" which has been (and is)
    used for both hyphen and minus sign (as well as some other dash-like
    entities which we'll ignore here).

    The principle states what to do when someone comes and asks for a
    specific character that only means "MINUS" and looks a bit different.

    If this request is found acceptable, then, the principle states, it's
    not enough to just code a "MINUS", but it's also necessary to code a
    "HYPHEN".

    The rationale for that is that by adding both new characters, the
    existing character can be used (as before) in an ambiguous manner. If,
    instead, only a "MINUS" was added, then users that wanted to contrast
    minus sign and hyphen would need to use the ambiguous character as if it
    was exclusively a hyphen. That would change the nature (read very
    significant properties) of that character in a way that the P&P finds
    not acceptable.

    If the proposal, instead had been to add just a HYPHEN, then there would
    have been pressure to treat the formerly ambiguous character as a MINUS,
    including perhaps, to have common implementations change its glyph over
    time. So that's equally objectionable.

    The P&P states, in essence, that a accepting a proposal may not result
    in changing the *identity* of any existing character - something that's
    prohibited by the character encoding stability policy. Disunification is
    allowed if it can be carried out (for example by encoding several new
    characters) in a way that preserves the (essential) interpretation of
    the existing character.

    Here, now we come to a new wrinkle. Sometimes character X is used only
    occasionally for purpose Y, usually because the real character isn't
    encoded and people make do. In that case, you can argue, the (essential)
    identity of X doesn't actually contain any aspect of Y, or so little as
    to not be predominant. In that case, just coding Y is acceptable.

    Making a judgment on whether this is a case that requires both new
    characters or only one is not something that proceeds from algorithmic
    application of rules. That's why the term "significant" is not further
    defined, beyond its ordinary use in the English language.

    One final observation. By rejecting any and all character proposals,
    it's always possible to maintain, indefinitely, any existing "character
    unification", for whatever reason. Therefore, the P&P does not talk
    about maintaining unifications, but how to deal with requests that would
    result in disunifications.

    We've now seen why suggestions for revision of the P&P text are best
    approached very cautiously, and only after a clear understanding of the
    impact of such changes. The document distills over a decade of
    experience of major participants in the character encoding effort, and
    it is primarily written for an audience of experts (in other words,
    delegates to WG2) to help ground their decisions in well-understood
    precedents. It's not a cookbook for deciding character encoding
    questions by rote.

    Proposals should focus on making a case for a particular encoding change
    on their own merits, not by arguing chapter and verse from the P&P.

    A./



    This archive was generated by hypermail 2.1.5 : Mon Sep 07 2009 - 19:01:56 CDT