From: Arcane Jill (firstname.lastname@example.org)
Date: Fri Jan 21 2005 - 08:56:41 CST
What with all the BOM difficulties, and the fact that U+FEFF doubles up as ZERO
WIDTH NO-BREAK SPACE, a new possibility occured to me.
Imagine if the codepoint U+D7FD were reserved as NOP, having properties which
essentially made it completely ignorable and invisible. It could simply be
thrown away, whereever it were encounted.
Now, it just so happens that its byte-swapped partner, 0xFDD7, is a
non-character codepoint, just like a byte-swapped BOM.
It also just so happens that in the Unicode roadmap
(http://www.unicode.org/roadmaps/bmp/), sandwiched between "Hangul Syllables"
and "High-half zone of UTF-16" is a little slot labelled "???" into which
U+D7FD would fit. In other words, it's not roadmapped for anything else.
Also, U+D7FD is just three codepoints away from the PUA. An irrelevant fact,
but nice nonetheless.
Now imagine, if you will, that at some time in the future, both uses of U+FEFF
are deprecated. U+D7FD could then take over as the new byte order marker -
except that /this/ choice will cause no problems for Unix. Why not? Because
Unix likes streams and filters, and it would be the work of a moment to feed
text through a filter that throws away any and all occurrences of NOP. Unlike
the existing BOM, the new NOP character would not be stateful - it wouldn't
matter /where/ in the stream it occurred. There need be no concept of the
"beginning" of a stream - you can stick NOP in whereever you like (including at
the beginning), and a simple, stateless filter can just throw them all away.
...and you'd still have the advantage that if you entounter a
byte-reversed-NOP, you'll know you've the endianness wrong.
Of course, existing text may still have ZWNBS in it, but such is the nature of
This is just a natty idea, not a formal proposal. Tell me what you think, guys.
(Especially Unix types). Stateless NOP seems to me like it would be easier to
magic away than stateful BOM, but that's just my opinion. (And I'm often
This archive was generated by hypermail 2.1.5 : Fri Jan 21 2005 - 09:01:07 CST