Re: String name and Character Name

From: Asmus Freytag (
Date: Wed Apr 20 2005 - 16:27:56 CST

  • Next message: Tex Texin: "Re: Unicode lexer"

    I have been watching this exchange for a while without dipping in an oar,
    but some of the more recent postings still show such deep misunderstanding
    of how to use the Unicode Standard that I thought a contribution worthwhile.

    It was suggested that:

    >A list of character names is useful only if it is entirely reliable, or at
    >least is moving towards being so. If this list contains only one error
    >(and there are a lot more) which is not going to be corrected, then the
    >list is worthy of nothing but to be thrown out and replaced - if only by
    >another almost identical list, which can be corrected.

    This appears to me a rather absolutist position. Any large list is likely
    to contain some errors. This is
    bad enough, but the understanding of what is correct, may not be shared.
    Therefore, attaining a status of 'proven correct' is effectively
    impossible, even if it can be contemplated as a theoretical ideal.

    >But if there is a majority for not formally deprecating this unreliable
    >list, I shall let people continue to incorporate this set of errors into
    >their software. Just don't expect me to buy any software which uses it.
    >>a non-existent problem... all Unicode character names are adequate for
    >>their intended purpose
    >Totally untrue!

    Not so fast here. The intended purpose of the character names is very much
    at issue here, and it does not explicitly include the task of supporting
    users in identifying characters.

    > Some of the errors are simply annoying, but others (if displayed to
    > users) cause users to choose the wrong character and so lead to total
    > confusion. Don't you think users would be confused if A was called B and
    > vice versa? Some of the errors are as blatant and confusing as that, in
    > some other script. [such as].... the ZARQA/ZINOR mix-up ...

    While in some cases there may be near universal agreement on a (single)
    name for a character, in other cases, not least of the SOLIDUS / SLASH,
    even ordinary users may not recognize what is undoubtedly a 'correct' and
    certainly correctly spelled name. While at the same time, no-one is deeply
    troubled by lamda vs. lambda, except perhaps when searching automatically
    in a list.

    Because of these issues, some longstanding and some unavoidable, the
    intended purpose of the nameslist was deliberately *reduced* to providing
    an unique and immutable identifier, subject to the rules of Annex L in
    ISO/IEC 10646 insofar as enforced by WG2.

    Unicode has always recognized the fact that many characters have multiple
    common names, and provided aliases. The suggestion of potentially providing
    a set of preferred aliases in contexts where user interfaces must work from
    a single list has been made repeatedly.

    Rather than continuing the fruitless bickering over the status of the
    formal character names, energies should perhaps be reserved for collecting
    any additional information that would be needed to design something that
    fits the stated purpose of allowing "(English speaking) users to identify
    characters by anme (or description)".


    This archive was generated by hypermail 2.1.5 : Wed Apr 20 2005 - 16:28:57 CST