Re: String name and Character Name

From: Philippe Verdy ([email protected])
Date: Thu Apr 21 2005 - 08:28:17 CST

Next message: Doug Ewell: "Re: String name and Character Name"

Previous message: Peter Kirk: "Re: String name and Character Name"
In reply to: Asmus Freytag: "Re: String name and Character Name"
Next in thread: John Hudson: "Re: String name and Character Name"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

From: "Asmus Freytag" <[email protected]>
>I have been watching this exchange for a while without dipping in an oar,
>but some of the more recent postings still show such deep misunderstanding
>of how to use the Unicode Standard that I thought a contribution
>worthwhile.
>
> It was suggested that:
>
>>A list of character names is useful only if it is entirely reliable, or at
>>least is moving towards being so. If this list contains only one error
>>(and there are a lot more) which is not going to be corrected, then the
>>list is worthy of nothing but to be thrown out and replaced - if only by
>>another almost identical list, which can be corrected.
>
>
> This appears to me a rather absolutist position. Any large list is likely
> to contain some errors. This is
> bad enough, but the understanding of what is correct, may not be shared.
> Therefore, attaining a status of 'proven correct' is effectively
> impossible, even if it can be contemplated as a theoretical ideal.

Yes but nothing prevents creating such "common" list (like the CLDR
resources) which just represent the best known practive for each language.

As long as this list is not made immutable in the standard, but clearly said
to be correctable over time, no one will use these names to identify
characters in internal critical algorithms (like in the Unicode regexps
using "\N{NAME}" specifiers that assume that this name is fixed over time
even if it looks "wrong").

But this localizable list will still be far better than the list of standard
names (just used in GUI applications when the localizable list has no
defined name yet for some character). So users of a defined language will
see the characer names they expect, and then will see default technical
names for characters they are not used with.

To make this work, one also needs to define a locale within which only the
default standard names will be seen. This should be the POSIX "C" locale in
C/C++ or the default locale in Java, that will contain these standard names.

English-reading users using an English locale will so view the non-standard
names "translated" in English (with best practices known for English) if
they are using programs in the English locale; distinctions of character
names between US and British English will be also possible in the GUIs.
Preferably, the localized character names should not be fully capitalized
(so that distinction between fully capitalized standard ISO/IEC/Unicode
character names and localized names remains possible).

> Unicode has always recognized the fact that many characters have multiple
> common names, and provided aliases. The suggestion of potentially
> providing a set of preferred aliases in contexts where user interfaces
> must work from a single list has been made repeatedly.
> Rather than continuing the fruitless bickering over the status of the
> formal character names, energies should perhaps be reserved for collecting
> any additional information that would be needed to design something that
> fits the stated purpose of allowing "(English speaking) users to identify
> characters by anme (or description)".

Fully agree.

But the description notes in the UCD are not enough because they mix usage
notes (for example list of languages where they are used), and alternate
character names. There's a real need to have a more synthetic and parsable
list of character names appropriate for each locale, and the requirement
that this list should not define closed standard names but best practices
that can be corrected at any time (so that no program will depend on the
listed names to identify a precise character).

Best-practice open standards are common today on the Internet and prooved to
be very useful. It's good to know that they can be revized, and not made
mandatory too for compliant applications, or that users are allowed to tweak
a bit the implementation of these open-standards to match their actual
needs. When most users feel that the tweak is needed always, it's good to
integrate it in the revized best-practice standard. The CLDR project is such
a open standard based on shared knowledge and experience of best practices.
We can avoid a lot of discussions because the lists are maintained in
separate locales/languages (so the discussions will only occur within a
single locale, where a consensus is more easily found).

Next message: Doug Ewell: "Re: String name and Character Name"
Previous message: Peter Kirk: "Re: String name and Character Name"
In reply to: Asmus Freytag: "Re: String name and Character Name"
Next in thread: John Hudson: "Re: String name and Character Name"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Thu Apr 21 2005 - 08:29:11 CST