Re: ISO 10646 compliance and EU law

From: Antoine Leca (
Date: Wed Jan 05 2005 - 14:04:07 CST

  • Next message: Kenneth Whistler: "Re: ISO 10646 compliance and EU law"

    On Wednesday, January 5th, 2005 19:17Z Kenneth Whistler va escriure:

    >> The Tibetan characters are _never_ encoded using Unicode in this
    >> process, are they?
    >> Looks like a clear case of nonconformance to me.
    > Not at all.

    Indeed, it seems there is no necessity to use Unicode defined code points to
    represent anything. Surprising (to me), but I guess it is the prize to pay
    to allow the upward compatibility.

    > If an application clearly states what it is doing, it can
    > do this conformantly in Unicode.


    > The Unicode *conformance* issue there is whether the Latin
    > letter "b" used in the Wylie transliteration is correctly
    > represented as U+0062, and whether, if using UTF-16, that
    > shows up in stored data and strings as a 16-bit code unit,
    > 0x0062, or if using UTF-8, that shows up in stored data
    > and strings as an 8-bit code unit, 0x62, and so on.

      - O

    But there are _no_ Latin letter "b" here; we are dealing with Tibetan
    letters, ain't we?

    Or did you switch one level lower, disregarding the semantic meaning of the
    translitteration text, to only attach yourself to grapheme used in the
    translitteration, which happens to be English letters in ASCII/UTF-8

    To make a more extreme (and dumb) example, let's assume I have an
    ISCII-based rendering system, using Roman (reversed for you)
    translitterations but not plain English (that is, both A and a would be
    written \xA4 if we speak about the grapheme, or \xAC if we speak about the
    English letter). Furthermore it exchanges them by adding a signaling 0xEC00
    to the ISCII codepoints, while not suming anything to the ASCII codepoints,
    resulting in using the ranges 0x000A-0x0040, 0x005B-0x0060, 0x007B-0x007E,
    and 0xECA1-0xECFA.

    Can I claim conformance to Unicode/10646 on the basis I am using codepoints
    0020 for SPACE, 002C for COMMA etc., that I do not destroy surrogates, I do
    not emit FFFF etc. etc.?

    [ Or is there a special case for the Latin letters that disallow this? ]

    Second question, if the above is "Yes I can claim conformance", what is the
    point of claiming conformance to Unicode/10646 (in such a case)?
    I remember Peter Constable remarking once that a process that rings the bell
    when submitted the code 7 is Unicode-conformant.


    This archive was generated by hypermail 2.1.5 : Wed Jan 05 2005 - 14:09:53 CST