RE: Variations of UTF-16 (was: Re: "UNICODE BOMBER STRIKES AGAIN")

From: jarkko.hietaniemi@nokia.com
Date: Wed Apr 24 2002 - 13:37:39 EDT

Previous message: Theo Veenker: "Re: Whence UniData.txt? (was Re: unidata is big)"
Maybe in reply to: Doug Ewell: "Variations of UTF-16 (was: Re: "UNICODE BOMBER STRIKES AGAIN")"
Next in thread: David Starner: "Re: Variations of UTF-16 (was: Re: "UNICODE BOMBER STRIKES AGAIN")"
Next in thread: Florian Weimer: "Re: "UNICODE BOMBER STRIKES AGAIN""
Reply: David Starner: "Re: Variations of UTF-16 (was: Re: "UNICODE BOMBER STRIKES AGAIN")"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

> Why? The problems with a BOM in UTF-8 have to do with it being an
> ASCII-compatible encoding.

Err, no. That's not the point, AFAIK. The point is that traditionally
in UNIX there hasn't been any sort of "marker" or "tag" in the beginning,
UNIX files being flat streams of bytes. The UNIX toolset has been built
with this principle in mind. No metadata in the files. BOM breaks this.

cat file1 file2 file3 > file4

will have three BOMs, two of them in the middle of file4.

wc -c file1

would have to skip the BOM not get the a wrong byte count.

sort -o file5 file1

would have to strip the BOM from file1 (but put in pack into file5?)

And so forth.

If you have a "multifork" filesystem, you can do tagging like this easily
since the "real payload" doesn't get mixed with the metadata. But traditional
UNIX filesystems do not have multifork filesystems.

Previous message: Theo Veenker: "Re: Whence UniData.txt? (was Re: unidata is big)"
Maybe in reply to: Doug Ewell: "Variations of UTF-16 (was: Re: "UNICODE BOMBER STRIKES AGAIN")"
Next in thread: David Starner: "Re: Variations of UTF-16 (was: Re: "UNICODE BOMBER STRIKES AGAIN")"
Next in thread: Florian Weimer: "Re: "UNICODE BOMBER STRIKES AGAIN""
Reply: David Starner: "Re: Variations of UTF-16 (was: Re: "UNICODE BOMBER STRIKES AGAIN")"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.2 : Wed Apr 24 2002 - 14:25:39 EDT