From: Antoine Leca (Antoine10646@leca-marti.org)
Date: Wed Jan 05 2005 - 14:04:07 CST
On Wednesday, January 5th, 2005 19:17Z Kenneth Whistler va escriure:
>> The Tibetan characters are _never_ encoded using Unicode in this
>> process, are they?
>> Looks like a clear case of nonconformance to me.
> Not at all.
Indeed, it seems there is no necessity to use Unicode defined code points to
represent anything. Surprising (to me), but I guess it is the prize to pay
to allow the upward compatibility.
> If an application clearly states what it is doing, it can
> do this conformantly in Unicode.
> The Unicode *conformance* issue there is whether the Latin
> letter "b" used in the Wylie transliteration is correctly
> represented as U+0062, and whether, if using UTF-16, that
> shows up in stored data and strings as a 16-bit code unit,
> 0x0062, or if using UTF-8, that shows up in stored data
> and strings as an 8-bit code unit, 0x62, and so on.
But there are _no_ Latin letter "b" here; we are dealing with Tibetan
letters, ain't we?
Or did you switch one level lower, disregarding the semantic meaning of the
translitteration text, to only attach yourself to grapheme used in the
translitteration, which happens to be English letters in ASCII/UTF-8
To make a more extreme (and dumb) example, let's assume I have an
ISCII-based rendering system, using Roman (reversed for you)
translitterations but not plain English (that is, both A and a would be
written \xA4 if we speak about the grapheme, or \xAC if we speak about the
English letter). Furthermore it exchanges them by adding a signaling 0xEC00
to the ISCII codepoints, while not suming anything to the ASCII codepoints,
resulting in using the ranges 0x000A-0x0040, 0x005B-0x0060, 0x007B-0x007E,
Can I claim conformance to Unicode/10646 on the basis I am using codepoints
0020 for SPACE, 002C for COMMA etc., that I do not destroy surrogates, I do
not emit FFFF etc. etc.?
[ Or is there a special case for the Latin letters that disallow this? ]
Second question, if the above is "Yes I can claim conformance", what is the
point of claiming conformance to Unicode/10646 (in such a case)?
I remember Peter Constable remarking once that a process that rings the bell
when submitted the code 7 is Unicode-conformant.
This archive was generated by hypermail 2.1.5 : Wed Jan 05 2005 - 14:09:53 CST