Re: Why people still want to encode precomposed letters

From: Hans Aberg (haberg@math.su.se)
Date: Mon Nov 24 2008 - 15:27:10 CST

  • Next message: John Hudson: "Re: Why people still want to encode precomposed letters"

    On 24 Nov 2008, at 19:19, Jukka K. Korpela wrote:

    >> Perhaps one only needs to list the combinations that belongs to to
    >> the proper language alphabets. In Swedish that would be
    >> "ijåäöÅÄÖ". Other combinations, like é, would not be as
    >> important to get right in Swedish, though it is imported from the
    >> French where it would appear. But it illustrates the idea.
    >
    > Technically, in the Unicode sense, “i” and “j” do not
    > contain a diacritic mark but are atomic (completely non-
    > decomposable) characters, even though a discussion of diacritic
    > marks must address the issue what happens to the dot in them.

    Just as the other "åäöÅÄÖ" though some producible by combining
    character combinations. So this is purely rendering question. If one
    puts diacritical marks on top of "ij", then the dots should be
    removed, and therefore TeX has undotted versions. My guess is that if
    one would design a font model where diacritics can be constructed at
    typesetting time, it would be convenient to do likewise.

    > The description of characters used in a language or in a locale is
    > addressed in the CLDR, see
    > http://www.unicode.org/reports/tr35/#Character_Elements
    > though very unsatisfactorily, if you ask me. It only addresses
    > letters, and it defines rather arbitrarily just two character sets
    > for a language. Surely, for example, “e” is more basically a
    > letter in English than “é” is, but “é” in turn is more of
    > an English letter than “ē” is. Moreover, the pragmatic reasons
    > for defining the character repertoires contain quite irrelevant
    > points like “choosing among character encodings.”

    I think there is core, that is quite fixed. The other sets are just
    imports, and more dynamic, and depends on what is being accepted.
    Unicode itself may speed this pricess up, as there are no practical
    restraints using them.

    > Anyway, describing the characters commonly used in a language is
    > useful for the purposes of font design. It is a difficult task,
    > though, and controversial. In practice, such descriptions are
    > probably more useful to people choosing between fonts than font
    > designers. For example, when choosing a font for Swedish text, you
    > should check that å, ä, ö, é, Å, Ä, Ö, É all look good. This
    > should be self-evident, but it often isn’t. Moreover, less common
    > characters are even more easily ignored. Thus, lists of characters
    > used in a language (at various levels of usage) are directly useful
    > for constructing test documents for font testing.

    A think it is most important for the core letters to be properly
    designed. But the example with "Å" being lowered in the Caledonia
    font from 1967 to exactly as high as "l" shows that electronic
    typesetting already has caused poor designs to happen. Font designers
    probably do not keep track of these subtleties anymore.

    So a typesetting model sufficiently advanced to do such adjustments
    when combining characters on the fly might in fact produce better
    results than now in use.

       Hans



    This archive was generated by hypermail 2.1.5 : Mon Nov 24 2008 - 15:30:45 CST