From: Markus Scherer (email@example.com)
Date: Tue Nov 05 2002 - 16:52:31 EST
Mark Davis wrote:
> Little probability that right double quote would appear at the start of a
> document either. Doesn't mean that you are free to delete it (*and* say that
> you are not modifying the contents).
This points to a pragmatic way to deal with this issue:
If software claims that it does not modify the contents of a document *except* for initial U+FEFF
then it can do with initial U+FEFF what it wants. If the whole discussion hinges on what is allowed
<em>if software claims to not modify text</em> then one need not claim that so absolutely.
Similarly, software may claim to not modify text contents _except_ that it may transform line
endings into LS or any other convention.
Not all software claims to not modify text, nor needs to claim that, and a lot of software does
> I agree that when the UTC decides that a BOM is *only* to be used as a
> signature, and that it would be ok to delete it anywhere in a document (like
> a non-character), then we are in much better shape. This was, as a matter of
> fact proposed for 3.2, but not approved. If we did that for 4.0, then there
> would be much less reason to distinguish UTF-8 'withBOM' from UTF-8
This would be good. The above would still be useful.
Joseph's request is actually different from the discussion of what is "the right thing": He mostly
wants to have labels that distinguish between different things to be done. If there is no consensus
for such labels here, then Joseph may need to use in his configuration file selectors that are
separate from charset labels.
Type charset BOM Comment
.txt UTF-8 require We want plain text files to
have BOM to distinguish
from legacy codepage files
.xml UTF-8 forbid Some XML processors may not cope with BOM
.htm UTF-8 maybe We want HTML to be UTF-8 but
will not insist on BOM
.rc not UTF n/a Unfortunately compiler insists on
these being codepage.
.rc UTF-16 require Alternative to the previous line.
.swt ASCII n/a Nonlocalizable internal format, must be ASCII.
-- Opinions expressed here may not reflect my company's positions unless otherwise noted.
This archive was generated by hypermail 2.1.5 : Tue Nov 05 2002 - 17:32:40 EST