On 06/20/2000 08:20:53 PM <dewell@compuserve.com> wrote:
[snip]
>It may be useful shorthand to define the term "UTF-8N" to refer to UTF-8
text
>that does not begin with a BOM, and reserve the term "UTF-8" for text that
>*does* begin with a BOM,
"UTF-8" currently does not, and so should not, be used to indicate the
definite presence of a BOM.
>but the fact is that both are really UTF-8, and people
>will use the term "UTF-8" to refer to both.
And rightly so.
> Adding (let alone registering) a
>new charset name to express this relatively minor difference will make it
look
>(as it does to Juliusz) like there are more Unicode encoding forms than
there
>really are.
We don't want distinct encoding schemes (schemes, I think, not forms) for
the UTF-8 encoding form that are distinguished by the presence or the
absence of a BOM. Presence or absence of a BOM doesn't constitute a
difference in encoding scheme for UTF-8, or even for UTF-16, for that
matter, because it is something separate from the character stream itself.
UTF-8 files both with and without a BOM serialize the character
representations into bytes (octets) in exactly the same way. That's the
basis for distinguishing between encoding schemes, and since there isn't a
difference, there is only one encoding scheme involved in both cases.
Peter Constable
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:04 EDT