Re: String name and Character Name

From: Philippe Verdy (
Date: Tue Apr 12 2005 - 15:25:30 CST

  • Next message: David Starner: "Re: String name and Character Name"

    From: "Peter Kirk" <>
    > On 12/04/2005 08:04, John Hudson wrote:
    >> The fact that names were originally assigned to be meaningful does not
    >> mean that individual names remain 'full of meaning'. The Latvian letters
    >> to which I referred earlier, which Unicode names identify as being 'WITH
    >> CEDILLA', are again a good example: yes, these names were originally
    >> assigned in the belief that they had a meaningful relationship to the
    >> identity of these letters. As it turns out, they were misnamed, because
    >> the mark below these letters is not a cedilla. As far as I'm concerned,
    >> this means that these particular names are not meaningful, because they
    >> do not accurately reflect the identity of the letters. This doesn't mean
    >> that they were not intended to be meaningful, but I reckon meaningfulness
    >> in terms of usefulness in describing reality. Since the name 'WITH
    >> CEDILLA' does not describe the real identity of these letters, the name
    >> cannot be said to be either useful or meaningful. The same is,
    >> regretably, true of the Tamil aytham: the name assigned by Unicode is
    >> incorrect and hence meaingless as an means of describing the actual
    >> identity of this character.
    > John, you asked separately why these names should be deprecated. And
    > surely you have answered the question. These names are incorrect, do not
    > describe the real identity of the characters, are meaningless and useless.
    > So best to deprecate them entirely, and replace them with a list of
    > meaningful names - which can be changed, both to correct errors and
    > because names may actually change with time. Most of this new list would
    > be identical with the old list. But a complete new list would be much
    > clearer and more useful to end users than the original list plus a
    > separate set of corrigenda - and all the more so if the corrigenda are not
    > actually formatted as alternate character names.

    Instead of focusing too much about the proper name of characters used in
    various languages or cultures, why not instead initiating a localization
    project within the CLDR, to create localized character names for each

    This would allow referencing also the correct english name even if the
    normative name remains unchanged in the UCD. After all, normative character
    names are not required to follow any language. They are just used for cross
    references within normalization discussions, and then after, everybody uses
    the assigned codepoints.

    If the CLDR database cannot host such localization project, could it be
    started somewhere else?

    At least the CLDR project should start by incorporating the normative
    English and French character names in ISO/IEC 10646. Then it can be
    corrected there to match the correct names, and other languages can be added
    as well to cover all or parts or the localized alphabets, with a reference
    to another language for missing resources (the default locale should use the
    normative Unicode/ISO/IEC 10646 names, not necessarily the English-US

    So we would solve the recent issues covered in naming '#' (hash, number
    sign, dièse) in various languages. Also it would solve the problem for
    incorrectly named Arabic or Thaana or Indic names. I think that the
    Unicode-hosted CLDR is the best place to complement the Unicode UCD with
    more descriptive names.

    This archive was generated by hypermail 2.1.5 : Tue Apr 12 2005 - 15:26:54 CST