Re: Variations of UTF-16 (was: Re: "UNICODE BOMBER STRIKES AGAIN")

From: David Starner (starner@okstate.edu)
Date: Wed Apr 24 2002 - 13:16:09 EDT


On Wed, Apr 24, 2002 at 09:00:17AM -0700, Doug Ewell wrote:
> The Unix and Linux world is very
> opposed to the use of BOM in plain-text files, and if they feel that way
> about UTF-8 they probably feel the same about UTF-16.

Why? The problems with a BOM in UTF-8 have to do with it being an
ASCII-compatible encoding. (I'd guess that if there are any Unixes that
use EBCDIC, the same problems would apply to UTF-EBCDIC.) Pretty much
the only reason one would use UTF-16 is to be compatible with a foreign
system, and then you use the conventions of that system.

Also, look at the output of file:

n2404r.doc: Microsoft Office document data
file.utf8: UTF-8 Unicode English text
file.utf16: Little-endian UTF-16 Unicode English character data
file.iso: data
file_list: ASCII text

There's basically two categories here; data or text. But UTF-16 is not
considered text; it's considered data, like a Word file. Most Unix users
would treat a UTF-16 encoded file the same way; as a format to be
converted from, or edited in a word processor only.

-- 
David Starner - starner@okstate.edu
"It's not a habit; it's cool; I feel alive. 
If you don't have it you're on the other side." 
- K's Choice (probably referring to the Internet)



This archive was generated by hypermail 2.1.2 : Wed Apr 24 2002 - 14:05:11 EDT