Capitalization (Was: 03F3 j Greek Letter yot)

From: Hans Aberg (haberg@math.su.se)
Date: Thu Feb 17 2005 - 13:00:26 CST

  • Next message: Kenneth Whistler: "Re: Uppercase variant of U+00DF LATIN SMA LL LETTER SHARP S ("German sharp s", "ß" )"

    At 10:19 -0500 2005/02/17, Patrick Andries wrote:
    > Antoine Leca a crit:
    >
    > On Wednesday, February 16th, 2005 09:05Z Radovan Garabik va escriure:
    >
    > Just across a street from here, there is a travel agency, having a
    >rather huge sign across their windows: "Preburg reisen" - in all
    >capitals, with being rather styled and blocky,
    >
    >Am I alone thinking this looks like a font issue?
    >
    > I would agree.
    >
    > PREBURG is the equivalent of small caps for me of Preburg. I believe Unicode
    does not regulate small caps forms...

    This hits a very interesting issue, the principles of adding characters to
    Unicode. One would think that it should be that characters should be added
    if they are semantically different, but not otherwise. For example, take the
    word "sin". If it is in English, it will not change semantics if written in
    say boldface. Therefore, English boldface letters should not be added to
    Unicode. But now assume that "sin" is in math. Then changing to boldface
    certainly alters the semantics, because of the math writing rules. So
    boldface math letters should be added, just as has been done.

    Now mix capitalization in the bag: In natural languages, capitalization
    typically does not alter the semantics of the word. This is most apparent in
    dictionaries and encyclopedias. For example, in Merriam-Webster, "Webster's
    Third International Third New International Dictionary", all look-up words
    are uncapitalized. In math, and in computer languages, capitalization
    changes the semantics. So if an intended sentence is starting with such a
    word, one is recommended to rewrite the sentence so that it does not start
    with the word. For example, the sentence "sh uses..." might be rewritten as
    "The shell sh uses...".

    So, since capitalization does not alter the semantics of the word, it seems
    that the capital letters should not be added at all to Unicode. However,
    capitalization can be used to communicate certain semantic information:
    Start of sentence, proper noun, (in German) noun, abbreviation, etc. If one
    sticks to the semantic approach, then one should add abstract characters
    "start of sentence", "proper noun", etc., zip out say the uppercase letters,
    and let the rendering machine make a correct presentation.

    But some of these uses are so ingrained that one for now must stick to a
    mixed approach: Sometimes characters are separated based on semantic
    differences, and sometimes based on glyph differences. If one should have a
    character set based on semantics principles alone, then that would probably
    require a great deal of work, and probably a wholly new character set,
    designed from scratch. Then the old Unicode set members must be expressible
    as sequences of this new character set.

      Hans Aberg



    This archive was generated by hypermail 2.1.5 : Thu Feb 17 2005 - 14:07:05 CST