Re: What is the principle?

From: Asmus Freytag (
Date: Fri Mar 26 2004 - 15:41:28 EST

  • Next message: Michael Everson: "Re: What is the principle?"

    At 01:33 PM 3/26/2004, Jim Allan wrote:
    >Arcane Jill posted:
    >>(A) A proposed character will be rejected if its glyph is identical in
    >>appearance to that of an extant glyph, regardless of its semantic
    >Obviously not.
    >Unicode encodes characters not glyphs. That particular glyphs of one
    >character are normally indistinguishable from particular glyphs of another
    >character (though perhaps in a different style) does not mean that the
    >characters themselves would be usefully unified.
    >Examples from the recent past are the deunification of Coptic from Greek
    >and the introduction of numerous Latin alphabet letter forms in various
    >styles as mathematical characters.

    Counterexamples are the rejection of various punctuation marks based on
    specific usage, e.g. the DECIMAL DOT

    >>(B) A proposed character will be rejected if its semantic meaning is
    >>identical to that of an extant character, regardless of the appearance
    >>of its glyph,
    >Obviously not.
    >For example, that a proposed character has the approximate semantic value
    >of IPA _b_ doesn't mean that it should be taken as just a variant glyph
    >of IPA _b_ and coded as U+0062. By that rule a large number of uncoded
    >scripts could be easily coded by assigning the glyphs to encoded glyphs of
    >approximately the same meaning and using a font change to render the script.
    >But changing to a different script by a font change (as opposed to a
    >different style of the same script) is not Unicode philosophy except in
    >the case of cipher character sets.

    There are three types of glyph variation: context based glyph variation (I
    include language as a context here), systematic glyph variation (font
    shift) and a kind of free variation.

    Technology supports the first two quite well, with a well-understood
    division of labor between plain text and well-defined markup information,
    subject to the availability of fonts. The third, where someone picks a
    different shape for some instances of a character (or symbol) in a text,
    but not for other instances, is not currently well supported, and most
    likely never will be. These 'random' selections of individual glyph shapes,
    though, tend to carry meaning to a reader. Therefore, the premise, that
    they convey the same semantics is not not necessarily satisfied any longer.
    In one extreme, for example the use of highly picturesque icons for one of
    the symbols, the effect is merely decorative or ideosynchratic and the
    premise is well satisfied. In the othere extreme, where such variations
    have been formalized into a notational system, the premise would be
    violated. Most cases occupy the gray zone in the middle.

    There are millions of fonts out there with variations of the zodiac. Font
    shifting would seem to be the correct answer to implement glyph variations
    there. (A wrong font will ruin the mood, but not change the identity of the
    symbol). Math, and some linguistic notations are the opposite: a wrong
    font, and you lose the
    meaning of the text.

    >>(C) A proposed character will be rejected if either (A) or (B) are true
    >A redundant suggestion.
    >However if both (A) *and* (B) were true there would be less likelihood
    >that a new encoded character would be of value, especially if users are
    >already *happily* using a character already coded in Unicode.
    >However if the normal glyphs of a proposed new character were mostly
    >identical to normal glyphs of an already encoded character and the
    >proposed new character also had meanings associated with it which mostly
    >corresponded to the meanings associated to the same already encoded
    >character then it is quite likely that there would be seen to be no need
    >to encode the proposed new character.
    >But even that would not be a rule.
    >If, for example, in a particular script that has yet to be encoded it
    >chanced that the character used for the normal sound indicated by IPA
    >_b_ actually looked like Latter letter _b_, it would still likely be
    >encoded as part of that script.
    >The separate encoding of Coptic characters is one precedent not forced by
    >compatibility with previous character encodings.
    >By another precedent, in the case of punctuation characters and
    >diacritical marks similarity of form with already encoded characters
    >bears more weight than it does with non-punctuation characters and
    >non-diacritical characters.
    >>(D) None of the above
    >Though of course these are points that would be considered in coming to a
    >There is a debated area here, which comes to the fore on occasion, for
    >example in regards to old Semitic scripts and whether particular Semitic
    >scripts should be lumped together or distinguished by separate encodings.
    >When the question of unifying or distinguishing between characters is
    >considered, it seems to me that the most important question is how
    >confusing or useful it would be to unify or distinguish between those
    >particular characters from the point of view of current users or expected


    >Unicode should do what is most useful.
    >Honest debate does arise, because what is useful in one sphere or from one
    >point of view may cause problems in another sphere or from another point
    >of view. Sometimes there is no definite correct answer.

    That comes from trying to be universal. I'm sure we have not finished that


    >Jim Allan

    This archive was generated by hypermail 2.1.5 : Fri Mar 26 2004 - 16:24:04 EST