Re: String name and Character Name

From: Philippe Verdy (
Date: Thu Apr 21 2005 - 08:28:17 CST

  • Next message: Doug Ewell: "Re: String name and Character Name"

    From: "Asmus Freytag" <>
    >I have been watching this exchange for a while without dipping in an oar,
    >but some of the more recent postings still show such deep misunderstanding
    >of how to use the Unicode Standard that I thought a contribution
    > It was suggested that:
    >>A list of character names is useful only if it is entirely reliable, or at
    >>least is moving towards being so. If this list contains only one error
    >>(and there are a lot more) which is not going to be corrected, then the
    >>list is worthy of nothing but to be thrown out and replaced - if only by
    >>another almost identical list, which can be corrected.
    > This appears to me a rather absolutist position. Any large list is likely
    > to contain some errors. This is
    > bad enough, but the understanding of what is correct, may not be shared.
    > Therefore, attaining a status of 'proven correct' is effectively
    > impossible, even if it can be contemplated as a theoretical ideal.

    Yes but nothing prevents creating such "common" list (like the CLDR
    resources) which just represent the best known practive for each language.

    As long as this list is not made immutable in the standard, but clearly said
    to be correctable over time, no one will use these names to identify
    characters in internal critical algorithms (like in the Unicode regexps
    using "\N{NAME}" specifiers that assume that this name is fixed over time
    even if it looks "wrong").

    But this localizable list will still be far better than the list of standard
    names (just used in GUI applications when the localizable list has no
    defined name yet for some character). So users of a defined language will
    see the characer names they expect, and then will see default technical
    names for characters they are not used with.

    To make this work, one also needs to define a locale within which only the
    default standard names will be seen. This should be the POSIX "C" locale in
    C/C++ or the default locale in Java, that will contain these standard names.

    English-reading users using an English locale will so view the non-standard
    names "translated" in English (with best practices known for English) if
    they are using programs in the English locale; distinctions of character
    names between US and British English will be also possible in the GUIs.
    Preferably, the localized character names should not be fully capitalized
    (so that distinction between fully capitalized standard ISO/IEC/Unicode
    character names and localized names remains possible).

    > Unicode has always recognized the fact that many characters have multiple
    > common names, and provided aliases. The suggestion of potentially
    > providing a set of preferred aliases in contexts where user interfaces
    > must work from a single list has been made repeatedly.
    > Rather than continuing the fruitless bickering over the status of the
    > formal character names, energies should perhaps be reserved for collecting
    > any additional information that would be needed to design something that
    > fits the stated purpose of allowing "(English speaking) users to identify
    > characters by anme (or description)".

    Fully agree.

    But the description notes in the UCD are not enough because they mix usage
    notes (for example list of languages where they are used), and alternate
    character names. There's a real need to have a more synthetic and parsable
    list of character names appropriate for each locale, and the requirement
    that this list should not define closed standard names but best practices
    that can be corrected at any time (so that no program will depend on the
    listed names to identify a precise character).

    Best-practice open standards are common today on the Internet and prooved to
    be very useful. It's good to know that they can be revized, and not made
    mandatory too for compliant applications, or that users are allowed to tweak
    a bit the implementation of these open-standards to match their actual
    needs. When most users feel that the tweak is needed always, it's good to
    integrate it in the revized best-practice standard. The CLDR project is such
    a open standard based on shared knowledge and experience of best practices.
    We can avoid a lot of discussions because the lists are maintained in
    separate locales/languages (so the discussions will only occur within a
    single locale, where a consensus is more easily found).

    This archive was generated by hypermail 2.1.5 : Thu Apr 21 2005 - 08:29:11 CST