Re: 8-bit text which is supposed to be UTF-8 but isn't

From: Addison Phillips [GSC] (
Date: Sun Jan 30 2000 - 22:25:43 EST

> Where is this group?

One such group is iDNS ( ?)

They have that odd UTF-5 proposal to be compatible with existing software.

Supposedly IETF has a working group, but I know nothing about it. I do know
that there was a recent expansion of legal domain names to 63 bytes (which
is almost 3x the old limit... or about what one would expect to accommodate
the BMP in UTF-8........)


Addison P. Phillips
Senior Globalization Consultant
Global Sight Corporation
101 Metro Drive, Suite 750
San Jose, California 95110
(+1) 408.350.3600 - Telephone
(+1) 408.350.3601 - Fax

Red Herring names Global Sight among the 1999 "Ten to Watch" in its annual
roundup of the top 100 companies of the electronic economy. Read more at:

Going global with your web site? Global Sight provides Web-based software
solutions that simplify the process, cut costs, and save time.

----- Original Message -----
From: John Cowan <>
To: Unicode List <>
Sent: Sunday, January 30, 2000 6:53 PM
Subject: Re: 8-bit text which is supposed to be UTF-8 but isn't

> Dan scripsit:
> > ISO 10646 is 31 bits. All possible values should be allowed.
> > I do not know why Unicode have decided to grow their bits to
> > more than 16 bits, but not to all 31 bits of ISO 10646.
> JTC1/SC2/WG2 have declared that they will not go past 0010FFFF,
> except for the (de facto deprecated) private-use areas
> at 00E00000-00FFFFFF and at 60000000-7FFFFFFF.
> > But that is no reason to not allow full 31 bits in UTF-8 encoded
> > text.
> It is, indeed, the reason.
> > You should also specify that Unicode technical report #15 normalisation
> > form C should be used. This will simplify much encoding/decoding
> > and help searching and case insensitivity comparisons.
> I would even go further, to require Form KC (no compatibility characters)
> as well, at least in headers if not in body text.
> > And best would be if this was valid everywhere, both in the protocol
> > headers and the body text. The current MIME-encodings in headers
> > are terrible.
> Agreed. I believe the current draft drops or deprecates those.
> (This is news, not mail, remember.)
> > No, case insensitivity should be available on all letters. It is
> > very important for many people. For a protocol
> > to work well it should be implemented using a well defined way like
> > section 2.3 in Unicode technical report #21.
> But why do case folding at all? Simply forbid the use of uppercase
> characters.
> > As there is a group working on getting international characters into
> > DNS, you may wait a little and see the results from them. It may
> > affect the Usenet News protocol.
> Where is this group?
> --
> John Cowan
> I am a member of a civilization. --David Brin

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:58 EDT