From: Peter Kirk (firstname.lastname@example.org)
Date: Sun Apr 03 2005 - 16:36:15 CST
On 03/04/2005 22:28, Doug Ewell wrote:
>Philippe Verdy <verdy underscore p at wanadoo dot fr> wrote:
>>>Yes. New CJK compatibility ideographs U+FA70..U+FAD9 have canonical
>>>decompositions into single characters. For example NFC(U+FACF) =
>>>U+2284A (for the first time a BMP character is normalized to
>>>something outside BMP).
>>Isn't that against Unicode statibility? Shouldn't it have been the
>>reverse, keeping U+FACF stable and normalizing U+2284A to U+FACF to
>>keep the compatibility? If this was added because of a past error,
>>then this MUST be urgently documented.
>They're new characters, Philippe. They weren't encoded until 4.1.
In that case these character allocations seem perverse, given that both
of these characters could have been assigned to the BMP, or both to
outside it - or the reverse normalisation as suggested by Philippe.
There is a serious danger of breaking existing implementations
(especially those which only fully support the BMP) by introducing a BMP
character which normalises to outside the BMP. For the BMP is now no
longer a closed subset of Unicode, under operations like normalisation
which existing implementations expected to find closed. Maybe someone
thought this was a good idea, to force implementations to be upgraded,
but it strikes me as a recipe for disaster. It could also be a serious
security hole, as hackers try sending U+FACF to various implementations
in an attempt to crash them.
-- Peter Kirk email@example.com (personal) firstname.lastname@example.org (work) http://www.qaya.org/ -- No virus found in this outgoing message. Checked by AVG Anti-Virus. Version: 7.0.308 / Virus Database: 266.9.1 - Release Date: 01/04/2005
This archive was generated by hypermail 2.1.5 : Sun Apr 03 2005 - 16:36:46 CST