Re: Conformance (Was: 32'nd bit & UTF-8)

From: Hans Aberg (haberg@math.su.se)
Date: Fri Jan 21 2005 - 12:47:21 CST

Next message: Richard T. Gillam: "RE: Byte-oriented lexer generator for Unicode"

Previous message: Hans Aberg: "Re: Conformance"
In reply to: Richard T. Gillam: "RE: Subject: Re: 32'nd bit & UTF-8"
Next in thread: Marcin 'Qrczak' Kowalczyk: "Re: Subject: Re: 32'nd bit & UTF-8"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

[Off the list.]

On 2005/01/21 17:49, Richard T. Gillam at rgillam@las-inc.com wrote:

>> So deprecating it seems to be a distinct possibility.
>
> I really wish you'd quit saying this. This simply isn't true. Or, at
> the very least, is EXTREMELY unlikely and very far into the future. As
> several other people have already pointed out to you, the Unicode
> codespace contains room for 1.1 million characters.

This in fact not the problem, but what would "deprecate" mean in the case of
a character standard? ASCII and the ISO-Latin etc encodings will never
become deprecated, even though they eventually may become obsolete. For the
word "deprecate" to make sense, there must be a notion of "Unicode
conformance". If say a protocol would require that UTF-8/16/32 all must be
supported, then UTF-16 could be made deprecated in that protocol.

As for the issue of filling up the code points, just wait and see. If they
should be exhausted quickly, that perhaps requires machine generated
encodings. Unicode perhaps need not support such specialty use

> Again, many people have addressed this point and you're ignoring them.
> UTF-8 HAS NO BOM. There is nothing in the Unicode standard mandating or
> even encouraging the use of EF BB BF at the beginning of a UTF-8 file.
> That sequence has no special meaning in UTF-8; it's just a zero-width
> non-breaking space. FE FF at the top of a UTF-8 file is just flat
> illegal.

We know that. See the other post in this new thread. The formulation in the
Unicode standard is vacuous and confusing, prone resulting in ambiguous
interpretations, and needs to be changed. There is no need for mentioning
the BOM at all, except as a curiosity note that programs and some protocols
may decide to give it special treatment. In that respect it is not different
from other character sequence markers for shell scripts, PS, etc. Unicode is
just a character encoding, and just provides the character for use.

Hans Aberg

Next message: Richard T. Gillam: "RE: Byte-oriented lexer generator for Unicode"
Previous message: Hans Aberg: "Re: Conformance"
In reply to: Richard T. Gillam: "RE: Subject: Re: 32'nd bit & UTF-8"
Next in thread: Marcin 'Qrczak' Kowalczyk: "Re: Subject: Re: 32'nd bit & UTF-8"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Fri Jan 21 2005 - 13:05:23 CST