So how about U+D7FD for a NOP then?

From: Arcane Jill (arcanejill@ramonsky.com)
Date: Fri Jan 21 2005 - 08:56:41 CST

  • Next message: Antoine Leca: "__STDC_ISO_10646__ [Was: 32'nd bit & UTF-8]"

    What with all the BOM difficulties, and the fact that U+FEFF doubles up as ZERO
    WIDTH NO-BREAK SPACE, a new possibility occured to me.

    Imagine if the codepoint U+D7FD were reserved as NOP, having properties which
    essentially made it completely ignorable and invisible. It could simply be
    thrown away, whereever it were encounted.

    Now, it just so happens that its byte-swapped partner, 0xFDD7, is a
    non-character codepoint, just like a byte-swapped BOM.

    It also just so happens that in the Unicode roadmap
    (http://www.unicode.org/roadmaps/bmp/), sandwiched between "Hangul Syllables"
    and "High-half zone of UTF-16" is a little slot labelled "???" into which
    U+D7FD would fit. In other words, it's not roadmapped for anything else.

    Also, U+D7FD is just three codepoints away from the PUA. An irrelevant fact,
    but nice nonetheless.

    Now imagine, if you will, that at some time in the future, both uses of U+FEFF
    are deprecated. U+D7FD could then take over as the new byte order marker -
    except that /this/ choice will cause no problems for Unix. Why not? Because
    Unix likes streams and filters, and it would be the work of a moment to feed
    text through a filter that throws away any and all occurrences of NOP. Unlike
    the existing BOM, the new NOP character would not be stateful - it wouldn't
    matter /where/ in the stream it occurred. There need be no concept of the
    "beginning" of a stream - you can stick NOP in whereever you like (including at
    the beginning), and a simple, stateless filter can just throw them all away.

    ...and you'd still have the advantage that if you entounter a
    byte-reversed-NOP, you'll know you've the endianness wrong.

    Of course, existing text may still have ZWNBS in it, but such is the nature of
    deprecation.

    This is just a natty idea, not a formal proposal. Tell me what you think, guys.
    (Especially Unix types). Stateless NOP seems to me like it would be easier to
    magic away than stateful BOM, but that's just my opinion. (And I'm often
    wrong).

    Jill



    This archive was generated by hypermail 2.1.5 : Fri Jan 21 2005 - 09:01:07 CST