Re: Conformance

From: Hans Aberg (haberg@math.su.se)
Date: Fri Jan 21 2005 - 12:47:21 CST

Next message: Hans Aberg: "Re: Conformance (Was: 32'nd bit & UTF-8)"

Previous message: Andy Heninger: "Re: UTF-8 'BOM' (was RE: Subject: Re: 32'nd bit & UTF-8)"
In reply to: Peter Kirk: "Re: Conformance (was UTF, BOM, etc)"
Next in thread: Richard T. Gillam: "RE: Conformance (was UTF, BOM, etc)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

On 2005/01/21 16:33, Peter Kirk at peterkirk@qaya.org wrote:

>> [Jill's Important Question 2]:
>> And the second question I must ask is: if a file is labelled by some
>> higher level protocol (for example, Unix locale, HTTP header, etc) as
>> "UTF-8", should a conformant process interpret that as UTF-8, the
>> Unicode Encoding FORM (which prohibits a BOM) or as UTF-8, the Unicode
>> Encoding SCHEME (which allows one)?
>>
> Excellent question! And what if it is not labelled at all, but expected
> to be UTF-8?

Here, it seems, the higher level protocol should define what should happen
with BOM, just as with any other character. UTF-8 just means that the byte
sequence is well formed according to UTF-8.

The Unicode standard, it seems, is prone to misinterpretations on this
point. It should be rewritten. There appears to be no need for it to mention
the BOM, except as a curiosity note, noting that programs and other
protocols may treat it differently than its 0xFEFF glyph semantics. In this
respect, it is not different from any other valid character sequence in
Unicode. Shell script or PS markers do not make those files not conforming
to Unicode. Unicode, as a character protocol, just provides the characters
and encodings, but does not enforce any particular of programs behavior
otherwise.

Hans Aberg

Next message: Hans Aberg: "Re: Conformance (Was: 32'nd bit & UTF-8)"
Previous message: Andy Heninger: "Re: UTF-8 'BOM' (was RE: Subject: Re: 32'nd bit & UTF-8)"
In reply to: Peter Kirk: "Re: Conformance (was UTF, BOM, etc)"
Next in thread: Richard T. Gillam: "RE: Conformance (was UTF, BOM, etc)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Fri Jan 21 2005 - 13:05:12 CST