From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Wed Jan 05 2005 - 09:57:57 CST
From: "Kenneth Whistler" <kenw@sybase.com>
>> Just using UTF-16 as character set, and we are registered as
>> conformant to ISO/IEC 10646. Nice, after all.
>
> Well, you would need to be using UTF-16 *correctly*. :-)
It's not enough: UTF-16 is just an encoding form (or scheme) and conformance
to UTF-16 encoding form just means that you will only use it to encode
codepoints U+0000 to U+10FFFF *inclusive*, without unpaired surrogates.
ISO-10646 conformance adds some requirements, because ISO-10646 maps
codepoints to characters:
- you must not encode any character using unassigned codepoints
- you must not include any non-character codepoints to represent plain text.
- you must obey to the standard definition of abstract characters for
plain-text: a Latin capital "A" must be encoded with codepoint U+0041, not
U+0042.
- if you need to encode data which cannot be bound to the existing abstract
characters, the only way is to use PUA codepoints.
ISO-10646 by itself is also NOT an encoding scheme. It is just the character
pertoire and its assigned numerical codes, called here "code points",
independantly of the encoding schemes used to transport a stream of such
characters.
But the UTF-16 encoding scheme is bound to the ISO/IEC 10646 repertoire
*only* when it is used as a "charset", in the accepted IANA/MIME definition,
i.e. when it is used to label plain-text contents. A charset is the
combination of a character encoding scheme (which is the transformation of a
sequence of numerical codes with a stream of bytes) here one of the three
UTF-16 encoding schemes, based on the single UTF-16 encoding form) and a
repertoire (here the ISO/IEC 10646 character repertoire, also used but not
defined by Unicode).
So under this definition, an application that uses any MIME/IANA registered
charset, with a published code mapping from the streamed encoded bytes to
code points, will be conforming to ISO/IEC 10646, because it uses
unambiguous code points, even if these codepoints are represented by 8 bit
code units of a legacy charset. However, to claim this conformance, the
application MUST also clearly label which mapping is used. This requires
using unambiguous charset labels, with agreed and standard mappings.
For example, an application exchanging data encoded with the GB18030 charset
will be conforming, provided that it restricts itself to using only the
intersection of the GB18030 repertoire and the ISO/IEC 10646 repertoire.
(Since now the mapping between GB18030 and ISO/IEC 10646 is well defined and
closed, the only way for the repertoire associated to GB18030 to be extended
is that the repertoire in ISO/IEC 10646 is extended). This is the same for
ISO-8859-* charsets, which are have a permanent mapping table between their
encoded byte values and code points.
But this is not the same for Windows codepages: they are opened, and can be
extended at any time by Microsoft, so their mapping is not fixed. The
solution is to use unambiguous *versioned* labels for these codepages, and
not to use only the codepage number: a "cp1252" charset label is not enough
to determine the mapping used. So texts exchanged with a unversioned
"cp1252" charset identifier will not be conforming to ISO/IEC 10646, and
thus not to Unicode as well. Unfortunately, the way to add specifiy the
codepage version in the charset identifier is not specified and standardized
by Microsoft, which regularly change these mappings by adding new
assignments, without creating a new identifier for the charset label.
EU laws do not require ISO 10646 conformance. What EU laws specify is that a
product that claims a conformance to a EU standard must use a clear
identification for this standard (this is true if this product claims
conformance to a EU member national standard such as AFNOR in France, or DIN
in Germany, or to a EU standard with the "CE" logo.)
The conditions related to the legal use of the "CE" standard are specified
by each standard, and no product is allowed to use the "CE" logo if it does
not respect the conformance requirements associated to the referenced EU
standard, at the time the product is manufactured or imported in the E.U.
(or in countries of the AELE economic area, where most CE standards are
applicable, and which include non-EU countries like Norway or Switzerland).
Some CE standards may also be applicable to other countries which have
agreeed to become signing parties to this standard (for example Turkey, or
other countries candidate to a future EU-membership, and that may already be
in transition to progressively integrate in their legislation the EU
standards).
Much discussion about this issue, but still, I have not been able to
determine clearly *which* CE standard requires conformance to the ISO/IEC
10646 standard.
Also the term "EU law" is not correct. There are no "EU law" by such. There
are recommandations by the European parlement, but all legal texts that have
some force are coming from the European Commission or the European Council
of Ministers. When they are voted, they become "Directives", but these
directives cannot become legal before they are applied by national laws in
each member country, which must formulate these directives and determine
which minimum number of items will be needed to apply such directive.
The situation is complex, because directives are not always applied
*completely* and exactly the way they are written in the original European
directive: each national parlement must study their own text, and can amend
the directive, and must produce a report back to the European institutions,
detailing how the national text applies this directive: if there's a
sufficient quorum of articles to apply, the directive is said "implemented"
by the national law, and the member country legislation conforms to the
European directives (a member country must implement a directive within a
limited time, about 2 years after it is decided).
Conclusion: it is not enough for products imported into the EU to conform to
the EU legislation. However, this legislation allows the product to come
into the economical area, and if it conforms to at least one national
legislation, it will be allowed to be sold in the whole economical area,
with some exceptions, that are detailed in the reports concerning each
member country and that are appended to the initial european legislation to
specify the excluded or modified clauses.
This archive was generated by hypermail 2.1.5 : Wed Jan 05 2005 - 11:36:16 CST