Re: What is the principle?

From: Jim Allan (
Date: Fri Mar 26 2004 - 16:33:58 EST

  • Next message: Antoine Leca: "Re: Printing and Displaying Dependent Vowels"

    Arcane Jill posted:

    > (A) A proposed character will be rejected if its glyph is identical in
    > appearance to that of an extant glyph, regardless of its semantic
    > meaning,

    Obviously not.

    Unicode encodes characters not glyphs. That particular glyphs of one
    character are normally indistinguishable from particular glyphs of
    another character (though perhaps in a different style) does not mean
    that the characters themselves would be usefully unified.

    Examples from the recent past are the deunification of Coptic from Greek
    and the introduction of numerous Latin alphabet letter forms in various
    styles as mathematical characters.

    > (B) A proposed character will be rejected if its semantic meaning is
    > identical to that of an extant character, regardless of the appearance
    > of its glyph,

    Obviously not.

    For example, that a proposed character has the approximate semantic
    value of IPA _b_ doesn't mean that it should be taken as just a
    variant glyph of IPA _b_ and coded as U+0062. By that rule a large
    number of uncoded scripts could be easily coded by assigning the glyphs
    to encoded glyphs of approximately the same meaning and using a font
    change to render the script.

    But changing to a different script by a font change (as opposed to a
    different style of the same script) is not Unicode philosophy except in
    the case of cipher character sets.

    > (C) A proposed character will be rejected if either (A) or (B) are true

    A redundant suggestion.

    However if both (A) *and* (B) were true there would be less likelihood
    that a new encoded character would be of value, especially if users are
    already *happily* using a character already coded in Unicode.

    However if the normal glyphs of a proposed new character were mostly
    identical to normal glyphs of an already encoded character and the
    proposed new character also had meanings associated with it which mostly
    corresponded to the meanings associated to the same already encoded
    character then it is quite likely that there would be seen to be no need
    to encode the proposed new character.

    But even that would not be a rule.

    If, for example, in a particular script that has yet to be encoded it
    chanced that the character used for the normal sound indicated by IPA
    _b_ actually looked like Latter letter _b_, it would still likely be
    encoded as part of that script.

    The separate encoding of Coptic characters is one precedent not forced
    by compatibility with previous character encodings.

    By another precedent, in the case of punctuation characters and
    diacritical marks similarity of form with already encoded characters
    bears more weight than it does with non-punctuation characters and
    non-diacritical characters.

    > (D) None of the above


    Though of course these are points that would be considered in coming to
    a decision.

    There is a debated area here, which comes to the fore on occasion, for
    example in regards to old Semitic scripts and whether particular Semitic
    scripts should be lumped together or distinguished by separate encodings.

    When the question of unifying or distinguishing between characters is
    considered, it seems to me that the most important question is how
    confusing or useful it would be to unify or distinguish between those
    particular characters from the point of view of current users or
    expected users.

    Unicode should do what is most useful.

    Honest debate does arise, because what is useful in one sphere or from
    one point of view may cause problems in another sphere or from another
    point of view. Sometimes there is no definite correct answer.

    Jim Allan

    This archive was generated by hypermail 2.1.5 : Fri Mar 26 2004 - 14:27:58 EST