> ISO 10646 is 31 bits. All possible values should be allowed.
> I do not know why Unicode have decided to grow their bits to
> more than 16 bits, but not to all 31 bits of ISO 10646.
JTC1/SC2/WG2 have declared that they will not go past 0010FFFF,
except for the (de facto deprecated) private-use areas
at 00E00000-00FFFFFF and at 60000000-7FFFFFFF.
> But that is no reason to not allow full 31 bits in UTF-8 encoded
It is, indeed, the reason.
> You should also specify that Unicode technical report #15 normalisation
> form C should be used. This will simplify much encoding/decoding
> and help searching and case insensitivity comparisons.
I would even go further, to require Form KC (no compatibility characters)
as well, at least in headers if not in body text.
> And best would be if this was valid everywhere, both in the protocol
> headers and the body text. The current MIME-encodings in headers
> are terrible.
Agreed. I believe the current draft drops or deprecates those.
(This is news, not mail, remember.)
> No, case insensitivity should be available on all letters. It is
> very important for many people. For a protocol
> to work well it should be implemented using a well defined way like
> section 2.3 in Unicode technical report #21.
But why do case folding at all? Simply forbid the use of uppercase
> As there is a group working on getting international characters into
> DNS, you may wait a little and see the results from them. It may
> affect the Usenet News protocol.
Where is this group?
-- John Cowan email@example.com I am a member of a civilization. --David Brin
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:58 EDT