Re: VS: continue: Glaring Mistake in nomenclature from Philippe Verdy on 2011-09-14 (Unicode Mail List Archive)

From: Philippe Verdy <verdy_p_at_wanadoo.fr>
Date: Wed, 14 Sep 2011 11:03:40 +0200

I may give some excuses to him, if he is not aware of the technical
justification of why names are immutables. But what he really wants is
to avoid being exposed to these "Bengali" names. This is not a matter
of tehcnical encoding, but more a question of localisation (for
example when using a character picker application, or when searching
character collections by names).

Nothing forbids a localized application of using accurate names that
match a specific language expectation about its alphabet.

But Mr Delex must understand that the UCS (by Unicode or ISO/IEC
10646) does NOT encode language-specific alphabets, but "unified"
scripts that share a lot of common letters and a common structure is
such a way that those languages can be freeely mixed and interchanged
without duplicating the letters.

May be the question could be forwarded to the CLDR project, about the
localisation of letter names for language-specific alphabets. For now
the CLDR project still has problems in just knowing which letters are
representative of the orthography used a single language (for example,
is the letter "é" is part of the English alphabet ? Is the letter "ā"
part of the French alphabet, because it is used in official toponymy ?
Same thing about "Å" for example in "Åland"...)

Just consider how we use the alphabets today, we frequently borrow
foreign letters from foreign alphabet, very easily because they are in
fact part of the same unified "script". Still, we do not need to
necessarily locally name those borrowed letters using the name of our
local alphabet for out local language.

But new characters won't generally be reencoded in the UCS (the UCS
still chose to NOT unify the Latin, Greek and Cyrilic alphabets, and
even chose later to desunify the Coptic alphabet from Greek; on the
opposite, it refused to desunify the Fraktur and Celtic alphabets from
Latin, because there was no frequent cases, clearly contrasting, where
such desunification would be necessary; at the same time the UCS still
maintains the IPA symbol set as a full part of the Latin alphabet,
even if it required reencoding in the Latin script some Greek
letters).

There are tehnical tradeoffs in those decisions of unification or
desunification of scripts. But it is important to understand that
scripts in the UCS are definitely not the same as alphabets; scripts
still need to be (arbitrarily) named from the name of the most
representative alphabet encoded in it (or an alphabet already
supported by a widely used legacy standard), and there's a good reason
to give technical names for all characters encoded in that script,
that reference this arbitrary script name, independantly of their use
in language specific alphabets.

Notes:

- ignore above the subclassification of "alphabets" into "true
alphabets", "abjads", "abugidas", or even "syllabaries", even if this
classification plays a very important role in the decision of unifying
them or desunifying them in the same "script" in the UCS.

- For Mr Delex: the "UCS" is the Universal Character Set, i.e. the
same **unified** repertoire standardized and encoded internationally
by both the Unicode standard and the ISO/IEC 10646 standard (and all
their annexes). Both standards do not encode language-specific
alphabets and cannot even give distinctive names used in various
languages to reference the same unified characters.

-- Philippe.

2011/9/14 Erkki I Kolehmainen <eik_at_iki.fi>:
> Dear Mr. Delex,
>
> Please, please spare us from further details in support of your crusade. You should finally accept the fact that the official block name cannot be changed, the rest is effectively OT.
>
> Thank you!
>
> Erkki I. Kolehmainen
Received on Wed Sep 14 2011 - 04:06:29 CDT

This archive was generated by hypermail 2.2.0 : Wed Sep 14 2011 - 04:06:30 CDT