Re: Unicode in source

From: John Cowan (cowan@locke.ccil.org)
Date: Thu Jul 22 1999 - 22:38:50 EDT


G. Adam Stanislav scripsit:

> What's a BOM? I tried to look it up on AltaVista and got a lot of religious
> references in what seemed like Portuguese ("Bom Jesus"). I suspect that is
> not what you are talking about. :-)

The Byte Order Mark character, FEFF, also known as ZERO WIDTH NON-BREAKING
SPACE, as close to a no-op as you can get. Its byte reversed fform
FFFE is a non-character, so if you always write an FEFF at the beginning
of any file, then any reader that reads FFFE knows to byte-swap the rest
of the input.

> Are you sure Notepad on Win95
> can cope with UTF-16? Perhaps on WinNT? Win95 pays only lip service to
> Unicode.

Yes, sorry, WinNT supports UTF-16-LE, and writes a correct BOM,
(first two bytes are FE FF), but it will not process UTF-16-BE
even *with* a correct BOM; it assumes the local code page instead.

> For the record, under Windows I use the editor that comes with Visual C++.
> Notepad is not exactly a programmer's editor. :-)

Yeah, we've already established that I'm not an expert, and now I'm not
a programmer either. Sheesh.

-- 
John Cowan                                   cowan@ccil.org
       I am a member of a civilization. --David Brin



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:48 EDT