Re: List of Latin characters which look the same but are encoded differently

From: Jukka K. Korpela (jkorpela@cs.tut.fi)
Date: Sat Dec 29 2007 - 11:41:09 CST

Next message: Asmus Freytag: "Re: List of Latin characters which look the same but are encoded differently"

Previous message: Mark Davis: "Re: List of Latin characters which look the same but are encoded differently"
In reply to: Mark Davis: "Re: List of Latin characters which look the same but are encoded differently"
Next in thread: Asmus Freytag: "Re: List of Latin characters which look the same but are encoded differently"
Reply: Asmus Freytag: "Re: List of Latin characters which look the same but are encoded differently"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Mark Davis wrote:

> No, it isn't complete. Take a look at UTRs 36 and 39, especially the
> data in http://www.unicode.org/reports/tr39/#References

I think Karl was referring a very specific class of confusables:

>> There are some Latin characters which look the same (at least very
>> similar, dependent of the font) but are encoded differently, all
>> because they are
>> paired with a character of the other case which are clearly
>> different.

Thus, this is about situations where two uppercase characters look
exactly the same (or almost the same) whereas their lowercase
counterparts are clearly different, or vice versa. Moreover, the scope
was limited to the Latin script. For example, Ð ~ Đ vs. ð ~ đ. As far as
I can see, Karl’s list is exhaustive, but it is quite possible that I
cannot see far enough here. (Across scripts, there are quite a many
examples, of course, like the Latin A and the Greek alpha Α having
identical glyphs while their lowercase forms are quite different from
each other.)

However, the additional note may have given the impression of a wider
scope:

>> Thus, the letter to be used cannot derived from its visual appearance
>> alone, but its context must be taken into account

Karl mentioned:

>> (a problem e.g. when
>> designing the labelling on a keyboard).

This is a very real and practical problem if you intend to create a
multilingual keyboard for European languages using the Latin script and
you wish to use letters as labels. How could a user, seeing “Đ”, know
whether it is eth or D with stroke? Well, if the tries it (without using
the Shift key), he will see which one it is, but it is quite possible
that he does not know that. He might know just one of the alternatives
and expect it, then get confused if it’s the wrong one (or, worse still,
not get confused but produce wrong data, with unpredictable
consequences, if he was using the Shift key to get the uppercase
letter).

It is unfortunate that uppercase letters are used to label keys. It’s
illogical, since the key produces the lowercase form, in the normal
state. And it causes problems like this. But it’s probably either too
late or too early to change such things.

One way to avoid such problems is to let the keyboard layout produce the
“stroke” characters using a dead key that effectively “puts a stroke
over the next letter”. You wouldn’t thus have any label for, say, D with
stroke; instead, it would be produced using the dead key (which might be
specially labeled) and a normal D key. The eth letter, on the other
hand, would be produced in a different way, probably using AltGr+D. The
D key should probably _not_ have either Ð or ð as an auxiliary label,
since Ð could be misleading and ð would deviate from the general idea of
keycap labels (which show the uppercase form). – This is, more or less,
what we did when designing the Finnish multilingual keyboard layout,
though the main focus was on making the key assignments _natural_, easy
to understand and easy to remember, even without added labels (which we
don’t get that easily). Once you’ve decided to use dead keys to produce
letters with diacritics and letters with a stroke, you won’t have that
many added Latin letters to cope with.

Jukka K. Korpela (“Yucca”)
http://www.cs.tut.fi/~jkorpela/

Next message: Asmus Freytag: "Re: List of Latin characters which look the same but are encoded differently"
Previous message: Mark Davis: "Re: List of Latin characters which look the same but are encoded differently"
In reply to: Mark Davis: "Re: List of Latin characters which look the same but are encoded differently"
Next in thread: Asmus Freytag: "Re: List of Latin characters which look the same but are encoded differently"
Reply: Asmus Freytag: "Re: List of Latin characters which look the same but are encoded differently"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Sat Dec 29 2007 - 11:45:13 CST