From: Hans Aberg (haberg@math.su.se)
Date: Thu Jan 20 2005 - 08:14:47 CST
On 2005/01/20 13:31, Philippe Verdy at vpi92@yahoo.fr wrote:
> Hans Aberg <haberg@math.su.se> wrote:
>> In fact, one idea might be to add \xFFFE and \xFFFF as delimiters for
>> file format markers. Then programs that do not need such markers need
>> not deal with them. Other program can make use of them, or simply
>> remove them at will.
>> Such markers could also be used to alter the format within the
>> same stream.
>
> What an horrible idea! Not only you are rejecting the idea of BOM, but
> now you want to introduce reassignements (that are already immutably
> defined to NON-CHARACTERS) that will BREAK the existing STANDARD
> which DOES use the fact that FFFE and FFFF are non-characters to
> reliably regnize byte-order marks and UTF encoding forms!
Sorry, there is a typo here: One will have to use \xFEFF in order know that
it is not byte swapped. See below though.
> I strongly reject such idea. Accept the ide of BOMs as they are, and
> then accept that they already expect that FFFF and FFFE won't EVER be
> used within encoded texts.
First of all, I want the BOM requirement to be dropped from UTF-8. Or invent
a new variation of UTF-8 which does not have a BOM requirement. (This latter
approach seems not prudent, as one should keep down the number of
encodings.)
But then there seems to be the need for a method to indicate file encodings
by the use of file contents. Then what one wants is that this indicator
should not be confused with the Unicode data proper. Further, this file
contents indicator should ideally be independent of encodings like UTF-8,
... So then one might agree that is it admissible, but not required, to
indicate not only the whole file encoding, but also use it to shift
encodings in a stream. I no not push for this myself, only indicating that
if one should admit file contents indicators, this might be a way to go.
Hans Aberg
This archive was generated by hypermail 2.1.5 : Thu Jan 20 2005 - 08:16:28 CST