RE: Conformance (was UTF, BOM, etc)

From: Lars Kristan (lars.kristan@hermes.si)
Date: Sat Jan 22 2005 - 03:44:44 CST

Next message: Lars Kristan: "RE: Byte-oriented lexer generator for Unicode"

Previous message: Lars Kristan: "RE: The "JDGI" file grows"
Maybe in reply to: Arcane Jill: "Conformance (was UTF, BOM, etc)"
Next in thread: Christopher Fynn: "Re: Conformance (was UTF, BOM, etc)"
Reply: Christopher Fynn: "Re: Conformance (was UTF, BOM, etc)"
Reply: Peter Kirk: "Re: Conformance (was UTF, BOM, etc)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Richard T. Gillam wrote:

> Peter Kirk had this one right. Certain encoding SCHEMES
> treat the byte
> sequence FEFF (or some variant of it) as a byte order mark when it
> appears at the beginning of a text stream. In these
> contexts, it's not
> a character at all; it's part of the communication protocol.

Not a character at all? Very well put! It is exactly what it should be. A
non-character. So not only the reverse-BOM, but also the BOM should both be
non-characters.

> A process
> operating on the actual text, after it's been deserialized
> and converted
> into an in-memory representation (an encoding FORM), doesn't see it.
>

And might treat the BOM as NOP. Whether this should be done at processing
time or at deserialization is up to the implementation. Either could prove
to be impractical or dangerous. Just a thought.

> Other encoding schemes don't treat FEFF as special. A
> process operating
> on the actual text after it's been deserialized will see this as the
> character U+FEFF, the ZWNBSP.

This is where the problem lies. In effort to make the BOM as harmless as
possible, sloppiness was allowed. A lot is spoken about differentiating text
from binary data. Well, then those people should also be strict about
differentiating plain text from serialized documents.

Back to Notepad - it produces documents, not plain text. For that matter,
Microsoft should provide a plain text editor, or extend Notepad with that
capability. But it is really up to them. They can leave it to other people
to do it. After all, in Windows, you don't need a text editor. There is no
plain text in Windows. Which is sometimes good, and sometimes bad.

Lars

Next message: Lars Kristan: "RE: Byte-oriented lexer generator for Unicode"
Previous message: Lars Kristan: "RE: The "JDGI" file grows"
Maybe in reply to: Arcane Jill: "Conformance (was UTF, BOM, etc)"
Next in thread: Christopher Fynn: "Re: Conformance (was UTF, BOM, etc)"
Reply: Christopher Fynn: "Re: Conformance (was UTF, BOM, etc)"
Reply: Peter Kirk: "Re: Conformance (was UTF, BOM, etc)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Sat Jan 22 2005 - 03:45:30 CST