Re: String name and Character Name

From: Hans Aberg (haberg@math.su.se)
Date: Sun Apr 24 2005 - 04:52:10 CST

  • Next message: Arcane Jill: "Re: String name and Character Name"

    At 21:24 +0200 2005/04/23, Marcin 'Qrczak' Kowalczyk wrote:
    >My orthography for Polish can consume an infinite number of characters
    >if I treat it like Hangul was treated and encode precomposed characters
    >individually. Ok, I'm taking all odd numbers, so even ones are left
    >for other scripts :-)

    Here is an attempt to try to extract an
    underlying principle: One can always group
    together symbols, forming a new semantic unit.
    Then the number of such semantic units can be
    very large, or even potentially infinite.

    In order for a semantic unit to be called a
    character, it should probably be atomic in some
    sense. Let's examine this idea:

    The Swedish language symbol ä (a with two dots
    above) is a separate letter, not to be viewed as
    an alteration of the letter a. So it is atomic.
    It is reasonable to enter it as a separate
    character. In German, however it is an umlaut,
    alteration of the letter a. So there one might
    add it as combination of two characters. In the
    program TeX, originally, ä would be constructed
    in the latter way. It then turns out that if one
    changes fonts, the dots do not end up exactly
    right typographically. So, because of this font
    limitation, it is suitable to have ä as a
    separate character. But now smart fonts are
    arriving. Then one can enter it as a combination
    of two characters always. It would be easy for a
    computer program, in Swedish to recognize it as a
    single Swedish letter ä. So when examining what
    is to be viewed as atomic, a number of principles
    can be used, and that in part depends on such
    things as what computer software one wants to use.

    -- 
       Hans Aberg
    


    This archive was generated by hypermail 2.1.5 : Sun Apr 24 2005 - 04:53:36 CST