Re: String name and Character Name

From: Jukka K. Korpela (jkorpela@cs.tut.fi)
Date: Thu Apr 28 2005 - 11:44:52 CST

  • Next message: Jukka K. Korpela: "Re: Code Point -- What is the integer?"

    On Thu, 28 Apr 2005, Hans Aberg wrote:

    > Glancing briefly at the many local names of the "@" symbol, it
    > suggests that Unicode should supply such localized descriptions.

    I think your message, which I quote in part only, nicely summarizes the
    idea that many people have been thinking about - with the provision that
    "Unicode" in this context means the Consortium rather than the Standard.
    Or, more appropriately, the CLDR work being carried out as coordinated by
    the Consortium.

    Defining localized names for characters (or other things) is primarily a
    business of a language community, so it should take place in such
    communities.

    Mentioning the "@" character opens a can of worms, though. It's one of the
    characters, along with "~" for example, that have a wide range of names in
    many languages, sometimes even heavily debated. Sometimes we can even
    isolate cultural environments (subcultures) that favor one or another word
    for a character. Although it would be possible, within the general idea of
    CLDR, to define locales corresponding to such environments, this is
    probably not a feasible idea.

    Normally each character has at most one name in a particular language, and
    in many contexts _a_ name is needed. For example, when a speech
    synthesizer cannot deal with a character in any better way, it should
    probably say its name, and we don't want it to speak a dozen aliases.
    In a character selection menu, on the other hand, multiple names can be
    useful if they help the user identify the character.

    Thus, the best format for definitions of localized names for characters
    would probably be an ordered list of names, with aliases welcomed but
    without names that might be seriously misleading (even if they are in
    use). An application could then use the first name, all the names, or the
    first n names for some value of n.

    There's perhaps a more urgent localization need: names of Unicode blocks,
    and maybe names of character collections as well. Such names do already
    appear in localized software such as Character Map, and many of the names
    are seriously misleading - in addition to being different in different
    applications in an unnecessary way. There's a much smaller amount of
    blocks than characters, so this would be just hard work, not huge work.

    -- 
    Jukka "Yucca" Korpela, http://www.cs.tut.fi/~jkorpela/
    


    This archive was generated by hypermail 2.1.5 : Thu Apr 28 2005 - 11:45:48 CST