From: Philippe Verdy (email@example.com)
Date: Wed Nov 09 2005 - 11:11:47 CST
From: "Erkki Kolehmainen" <firstname.lastname@example.org>
> Philippe Verdy wrote:
>> Why isn't there a project in CLDR to create such supplementary data for
>> translated character names (that won't be identifiers, the only
>> identifiers being the normative 4-to-6-digit hexadecimal code points)?
> This would be a truly major project for all the languages, and each of
> them would require an unprecedented consensus for all the names. In
> Finland, we have translated the names of the Multilingual European Subset
> 2 (MES-2) into Finnish and made the list freely available as a
> recommendation - not a standard - of the Finnish Standards Association
> SFS. We are now starting the process to expand the list, but we are only
> considering the addition of a few hundred character names.
Not that huge: the "root" locale can be fed with existing standard names,
and then what is needed is a repository to store the per-language
corrections when they are attested for that language. For most characters,
no translation would be needed, and the normative name would be inherited
from "root". So those that complain about incorrect English names could
perform these corrections in the English locale, but not in the "root"
But I agree that once such project is started, there will be lots of updates
for each language, trying to add more and more character name translations.
The first thing to translate would be the basic set of letters, digits, and
punctuations needed for the language. Then one could expand it to cover a
significant subset of the native script, and finally cover the whole script,
and symbols, before attempting to cover other scripts. At that time, the
basic English collections would have been covered too, as well as major
languages, so this would facilitate the creation of translations for
languages that use other scripts than the existing translations.
Such database would also resolve the various ambiguities that some existing
names are causing: it's difficult to guess which character is effectively
meant by the normative name, if you have not seen and compared their
representative glyphs, and usage notes (when they are present in the Unicode
names list file). Having to download a whole (possibly big, for example with
Han ideographs) PDF to get those information is too much demanding when a
better name could improve the correct interpretation of names, and would
facilitate the search of characters by names.
For some scripts, the database should also contain additional resource keys
to retrive extra information (notably in Han ideographs, for which a search
by radical and strokes would be helpful, as well as search by
Finally, the representative glyphs could also be stored as bitmaps with a
limited resolution (or SVG?) in the "root" locale, in another extra database
(would be helpful mostly for Han ideographs), but this should not compete
with font implementations (so no glyph properties, no kerning pairs, etc...
only the rendered graphic at a single size would be stored). This would
allow building input editors and other character selectors that present the
characters in a grid.
This archive was generated by hypermail 2.1.5 : Wed Nov 09 2005 - 11:14:01 CST