Re: ISO 10646 compliance and EU law

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Wed Jan 05 2005 - 09:57:57 CST

  • Next message: Peter Kirk: "Re: ISO 10646 compliance and EU law"

    From: "Kenneth Whistler" <kenw@sybase.com>
    >> Just using UTF-16 as character set, and we are registered as
    >> conformant to ISO/IEC 10646. Nice, after all.
    >
    > Well, you would need to be using UTF-16 *correctly*. :-)

    It's not enough: UTF-16 is just an encoding form (or scheme) and conformance
    to UTF-16 encoding form just means that you will only use it to encode
    codepoints U+0000 to U+10FFFF *inclusive*, without unpaired surrogates.

    ISO-10646 conformance adds some requirements, because ISO-10646 maps
    codepoints to characters:
    - you must not encode any character using unassigned codepoints
    - you must not include any non-character codepoints to represent plain text.
    - you must obey to the standard definition of abstract characters for
    plain-text: a Latin capital "A" must be encoded with codepoint U+0041, not
    U+0042.
    - if you need to encode data which cannot be bound to the existing abstract
    characters, the only way is to use PUA codepoints.

    ISO-10646 by itself is also NOT an encoding scheme. It is just the character
    pertoire and its assigned numerical codes, called here "code points",
    independantly of the encoding schemes used to transport a stream of such
    characters.

    But the UTF-16 encoding scheme is bound to the ISO/IEC 10646 repertoire
    *only* when it is used as a "charset", in the accepted IANA/MIME definition,
    i.e. when it is used to label plain-text contents. A charset is the
    combination of a character encoding scheme (which is the transformation of a
    sequence of numerical codes with a stream of bytes) here one of the three
    UTF-16 encoding schemes, based on the single UTF-16 encoding form) and a
    repertoire (here the ISO/IEC 10646 character repertoire, also used but not
    defined by Unicode).

    So under this definition, an application that uses any MIME/IANA registered
    charset, with a published code mapping from the streamed encoded bytes to
    code points, will be conforming to ISO/IEC 10646, because it uses
    unambiguous code points, even if these codepoints are represented by 8 bit
    code units of a legacy charset. However, to claim this conformance, the
    application MUST also clearly label which mapping is used. This requires
    using unambiguous charset labels, with agreed and standard mappings.

    For example, an application exchanging data encoded with the GB18030 charset
    will be conforming, provided that it restricts itself to using only the
    intersection of the GB18030 repertoire and the ISO/IEC 10646 repertoire.
    (Since now the mapping between GB18030 and ISO/IEC 10646 is well defined and
    closed, the only way for the repertoire associated to GB18030 to be extended
    is that the repertoire in ISO/IEC 10646 is extended). This is the same for
    ISO-8859-* charsets, which are have a permanent mapping table between their
    encoded byte values and code points.

    But this is not the same for Windows codepages: they are opened, and can be
    extended at any time by Microsoft, so their mapping is not fixed. The
    solution is to use unambiguous *versioned* labels for these codepages, and
    not to use only the codepage number: a "cp1252" charset label is not enough
    to determine the mapping used. So texts exchanged with a unversioned
    "cp1252" charset identifier will not be conforming to ISO/IEC 10646, and
    thus not to Unicode as well. Unfortunately, the way to add specifiy the
    codepage version in the charset identifier is not specified and standardized
    by Microsoft, which regularly change these mappings by adding new
    assignments, without creating a new identifier for the charset label.

    EU laws do not require ISO 10646 conformance. What EU laws specify is that a
    product that claims a conformance to a EU standard must use a clear
    identification for this standard (this is true if this product claims
    conformance to a EU member national standard such as AFNOR in France, or DIN
    in Germany, or to a EU standard with the "CE" logo.)

    The conditions related to the legal use of the "CE" standard are specified
    by each standard, and no product is allowed to use the "CE" logo if it does
    not respect the conformance requirements associated to the referenced EU
    standard, at the time the product is manufactured or imported in the E.U.
    (or in countries of the AELE economic area, where most CE standards are
    applicable, and which include non-EU countries like Norway or Switzerland).
    Some CE standards may also be applicable to other countries which have
    agreeed to become signing parties to this standard (for example Turkey, or
    other countries candidate to a future EU-membership, and that may already be
    in transition to progressively integrate in their legislation the EU
    standards).

    Much discussion about this issue, but still, I have not been able to
    determine clearly *which* CE standard requires conformance to the ISO/IEC
    10646 standard.

    Also the term "EU law" is not correct. There are no "EU law" by such. There
    are recommandations by the European parlement, but all legal texts that have
    some force are coming from the European Commission or the European Council
    of Ministers. When they are voted, they become "Directives", but these
    directives cannot become legal before they are applied by national laws in
    each member country, which must formulate these directives and determine
    which minimum number of items will be needed to apply such directive.

    The situation is complex, because directives are not always applied
    *completely* and exactly the way they are written in the original European
    directive: each national parlement must study their own text, and can amend
    the directive, and must produce a report back to the European institutions,
    detailing how the national text applies this directive: if there's a
    sufficient quorum of articles to apply, the directive is said "implemented"
    by the national law, and the member country legislation conforms to the
    European directives (a member country must implement a directive within a
    limited time, about 2 years after it is decided).

    Conclusion: it is not enough for products imported into the EU to conform to
    the EU legislation. However, this legislation allows the product to come
    into the economical area, and if it conforms to at least one national
    legislation, it will be allowed to be sold in the whole economical area,
    with some exceptions, that are detailed in the reports concerning each
    member country and that are appended to the initial european legislation to
    specify the excluded or modified clauses.



    This archive was generated by hypermail 2.1.5 : Wed Jan 05 2005 - 11:36:16 CST