Re: UTF-8 signature in web and email

From: John Cowan (cowan@mercury.ccil.org)
Date: Fri May 25 2001 - 07:08:18 EDT

Next message: B: "Re: A Europe of fonts"
Previous message: John Cowan: "Re: A Europe of fonts"
In reply to: Bill Kurmey: "RE: UTF-8 signature in web and email"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Bill Kurmey scripsit:

> Are there not 2 versions of UTF-8, the Unicode Standard (maximum of 4
> octets) and the ISO/IEC Annex/Amendment to 10646 (maximum of 6 octets)?

Theoretically yes.

> Is Unicode UTF-8 diverging from ISO by the way in which a scalar value is
> encoded in UTF-8? Should folks be concerned that the IETF RFC-2279 and
> RFC-2781 refer to UTF-8 and UTF-16 as "a transformation format of ISO
> 10646" with UTF-8 on the Standards Track? Will the discrepancy between the
> Unicode and ISO versions be synchronized?

Yes, it will. An amendment to 10646 is going through the process that will
cut off all Planes above 0x10, since there is obviously no need for them.

> I realize that in practical terms the discrepancy may be only common sense
> at this time until substantially more scalar values are assigned, but would
> it not become a concern if ISO decides to retain its original method of
> assigning 1-6 octets as specified in RFC-2279?

Don't worry about it.

> Finally, which method of encoding a scalar value in UTF-8 are Internet
> software developers using, the ISO method as specified in the RFCs possibly
> with Unicode as an optional variant subset?

Not a "variant" subset, an exact subset. And all the oodles of ISO codepoints
that aren't available to Unicode are firmly unused. (The only technical
exception is a couple of Private Use Areas in the high planes, but those
too are being cut off soon.)

> I don't think the 0x0A was a design choice for Unix, it was simply the way
> DEC distinguished its hardware and software from other manufacturers for
> reasons similar to Friden, avoiding patent infringement and/or Trademark
> litigation.

No, all actual DEC operating systems used the CRLF convention, which was
inherited by CP/M and thence MS-DOS and Windows.

-- 
John Cowan                                   cowan@ccil.org
One art/there is/no less/no more/All things/to do/with sparks/galore
	--Douglas Hofstadter

Next message: B: "Re: A Europe of fonts"
Previous message: John Cowan: "Re: A Europe of fonts"
In reply to: Bill Kurmey: "RE: UTF-8 signature in web and email"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.2 : Fri Jul 06 2001 - 00:18:17 EDT