Re: List of Latin characters which look the same but are encoded differently

From: Jukka K. Korpela (jkorpela@cs.tut.fi)
Date: Sat Dec 29 2007 - 11:41:09 CST

  • Next message: Asmus Freytag: "Re: List of Latin characters which look the same but are encoded differently"

    Mark Davis wrote:

    > No, it isn't complete. Take a look at UTRs 36 and 39, especially the
    > data in http://www.unicode.org/reports/tr39/#References

    I think Karl was referring a very specific class of confusables:

    >> There are some Latin characters which look the same (at least very
    >> similar, dependent of the font) but are encoded differently, all
    >> because they are
    >> paired with a character of the other case which are clearly
    >> different.

    Thus, this is about situations where two uppercase characters look
    exactly the same (or almost the same) whereas their lowercase
    counterparts are clearly different, or vice versa. Moreover, the scope
    was limited to the Latin script. For example, Ð ~ Đ vs. ð ~ đ. As far as
    I can see, Karl’s list is exhaustive, but it is quite possible that I
    cannot see far enough here. (Across scripts, there are quite a many
    examples, of course, like the Latin A and the Greek alpha Α having
    identical glyphs while their lowercase forms are quite different from
    each other.)

    However, the additional note may have given the impression of a wider
    scope:

    >> Thus, the letter to be used cannot derived from its visual appearance
    >> alone, but its context must be taken into account

    Karl mentioned:

    >> (a problem e.g. when
    >> designing the labelling on a keyboard).

    This is a very real and practical problem if you intend to create a
    multilingual keyboard for European languages using the Latin script and
    you wish to use letters as labels. How could a user, seeing “Đ”, know
    whether it is eth or D with stroke? Well, if the tries it (without using
    the Shift key), he will see which one it is, but it is quite possible
    that he does not know that. He might know just one of the alternatives
    and expect it, then get confused if it’s the wrong one (or, worse still,
    not get confused but produce wrong data, with unpredictable
    consequences, if he was using the Shift key to get the uppercase
    letter).

    It is unfortunate that uppercase letters are used to label keys. It’s
    illogical, since the key produces the lowercase form, in the normal
    state. And it causes problems like this. But it’s probably either too
    late or too early to change such things.

    One way to avoid such problems is to let the keyboard layout produce the
    “stroke” characters using a dead key that effectively “puts a stroke
    over the next letter”. You wouldn’t thus have any label for, say, D with
    stroke; instead, it would be produced using the dead key (which might be
    specially labeled) and a normal D key. The eth letter, on the other
    hand, would be produced in a different way, probably using AltGr+D. The
    D key should probably _not_ have either Ð or ð as an auxiliary label,
    since Ð could be misleading and ð would deviate from the general idea of
    keycap labels (which show the uppercase form). – This is, more or less,
    what we did when designing the Finnish multilingual keyboard layout,
    though the main focus was on making the key assignments _natural_, easy
    to understand and easy to remember, even without added labels (which we
    don’t get that easily). Once you’ve decided to use dead keys to produce
    letters with diacritics and letters with a stroke, you won’t have that
    many added Latin letters to cope with.

    Jukka K. Korpela (“Yucca”)
    http://www.cs.tut.fi/~jkorpela/



    This archive was generated by hypermail 2.1.5 : Sat Dec 29 2007 - 11:45:13 CST