Re: String name and Character Name

From: Jukka K. Korpela (jkorpela@cs.tut.fi)
Date: Sat Apr 16 2005 - 06:15:12 CST

  • Next message: Michael Everson: "Re: Sindhi characters proposed"

    On Sat, 16 Apr 2005, Peter Kirk wrote:

    > This thread is about Unicode Character Names. I am well aware of
    > annotations. The problem is that most software offering to users a
    > choice of characters, e.g. from a character map, does not display
    > annotations fully.

    Unless I have missed something, there is no simple way to include the
    annotations into a program that displays characters, except by copying
    them by hand from the Unicode block descriptions. Things would be easier
    if the information were available in a more convenient format.

    Moreover, in addition to the annotations, there are various usage notes on
    individual characters at different places of the Unicode standard.

    It is quite understandable that implementors include just the official
    Unicode name. After all, for most characters, the name _is_ descriptive,
    and often the only descriptive information that the standard has about
    an individual character. It is surely more useful to have a Character Map
    that shows both the glyphs and the names of characters than one that just
    shows the glyphs (as in old software). But to address the problems caused
    by some misleading names, we would need to identify them so that programs
    can take appropriate action.

    > Anyway, if the official but incorrect character name
    > is X, and the annotation says something "or Y", where Y is the correct
    > name, or even "not really X but Y", and such an annotation is displayed,
    > the result will simply be confusion for users.

    So what do you suggest? Showing just Y? I think this must ultimately
    depend on the software designer and on the user's choices (if permitted by
    the software). In some cases, the official names might be needed, at least
    upon user request. In some cases, it might be sufficient to show just a
    localized name, relying on the code number for universal identification.

    The idea of using CLDR for the purposes of localized names for characters
    is probably crucial to addressing the problem of misleading official
    names. It allows each language community to define, to the extent it finds
    useful and possible, descriptive names that are widely understood within
    the community. These names could then be used in utilities like Character
    Map.

    But defining localized names is a huge task, especially if it needs to be
    based on some kind of consensus. I would expect that for most language
    forms, the localized data would consist of the names of characters
    commonly used in the language itself (broadly speaking). Even this will
    take quite some time and effort.

    This means that many of the official names that can be regarded as
    misleading would still be used even in software that implement the
    localized name idea - assuming that the official names would constitute
    the default names, to be shown when the locale has no name for a
    character. The alternative of using the code number does not sound good,
    since for the vast majority of characters, the official name is more
    informative to most people than the number.

    How about the following idea of overcoming the difficulty?
    1. Identify the characters with misleading official names.
    2. Define better names for them in the "en" locale, and preferably
       in the "fr" locale as well.
    3. Enhance CLDR with the feature of combining locales, in the sense
       that a user's locale choice can consist of a sequence of locales
       in order of preference. For example, a user's choice could mean
       "use the 'de' locale for anything defined there but the 'en'
       locale for things that aren't define in the 'de' locale".

    That way, when accessing a character with a misleading official name,
    the information shown to the user would consist of its localized name
    in the "en" locale (or maybe "fr" locale), unless a name has been defined
    for it in the user's preferred locale.

    -- 
    Jukka "Yucca" Korpela, http://www.cs.tut.fi/~jkorpela/
    


    This archive was generated by hypermail 2.1.5 : Sat Apr 16 2005 - 06:17:06 CST