From: Jukka K. Korpela (email@example.com)
Date: Sat Apr 16 2005 - 06:15:12 CST
On Sat, 16 Apr 2005, Peter Kirk wrote:
> This thread is about Unicode Character Names. I am well aware of
> annotations. The problem is that most software offering to users a
> choice of characters, e.g. from a character map, does not display
> annotations fully.
Unless I have missed something, there is no simple way to include the
annotations into a program that displays characters, except by copying
them by hand from the Unicode block descriptions. Things would be easier
if the information were available in a more convenient format.
Moreover, in addition to the annotations, there are various usage notes on
individual characters at different places of the Unicode standard.
It is quite understandable that implementors include just the official
Unicode name. After all, for most characters, the name _is_ descriptive,
and often the only descriptive information that the standard has about
an individual character. It is surely more useful to have a Character Map
that shows both the glyphs and the names of characters than one that just
shows the glyphs (as in old software). But to address the problems caused
by some misleading names, we would need to identify them so that programs
can take appropriate action.
> Anyway, if the official but incorrect character name
> is X, and the annotation says something "or Y", where Y is the correct
> name, or even "not really X but Y", and such an annotation is displayed,
> the result will simply be confusion for users.
So what do you suggest? Showing just Y? I think this must ultimately
depend on the software designer and on the user's choices (if permitted by
the software). In some cases, the official names might be needed, at least
upon user request. In some cases, it might be sufficient to show just a
localized name, relying on the code number for universal identification.
The idea of using CLDR for the purposes of localized names for characters
is probably crucial to addressing the problem of misleading official
names. It allows each language community to define, to the extent it finds
useful and possible, descriptive names that are widely understood within
the community. These names could then be used in utilities like Character
But defining localized names is a huge task, especially if it needs to be
based on some kind of consensus. I would expect that for most language
forms, the localized data would consist of the names of characters
commonly used in the language itself (broadly speaking). Even this will
take quite some time and effort.
This means that many of the official names that can be regarded as
misleading would still be used even in software that implement the
localized name idea - assuming that the official names would constitute
the default names, to be shown when the locale has no name for a
character. The alternative of using the code number does not sound good,
since for the vast majority of characters, the official name is more
informative to most people than the number.
How about the following idea of overcoming the difficulty?
1. Identify the characters with misleading official names.
2. Define better names for them in the "en" locale, and preferably
in the "fr" locale as well.
3. Enhance CLDR with the feature of combining locales, in the sense
that a user's locale choice can consist of a sequence of locales
in order of preference. For example, a user's choice could mean
"use the 'de' locale for anything defined there but the 'en'
locale for things that aren't define in the 'de' locale".
That way, when accessing a character with a misleading official name,
the information shown to the user would consist of its localized name
in the "en" locale (or maybe "fr" locale), unless a name has been defined
for it in the user's preferred locale.
-- Jukka "Yucca" Korpela, http://www.cs.tut.fi/~jkorpela/
This archive was generated by hypermail 2.1.5 : Sat Apr 16 2005 - 06:17:06 CST