On Thu, Jul 22, 1999 at 02:10:16PM -0700, John Cowan wrote:
> Either way, but with an appropriate BOM, and good software will be
> able to cope.
What's a BOM? I tried to look it up on AltaVista and got a lot of religious
references in what seemed like Portuguese ("Bom Jesus"). I suspect that is
not what you are talking about. :-)
> > Besides, UTF-16 can only contain the first plane.
>
> No, that's UCS-2 (which is moribund). UTF-16 handles planes 0-0x10,
> which is rather more than all the planes there will ever be.
Oh, sorry, I misunderstood. Then I would argue for UTF-8 even more since you
need to convert the data anyway... But UTF-8 can include legacy code without
modifications (assuming it is in 7-bit ASCII).
> > Even though, strictly
> > speaking, Unicode is 16-bit, the ISO standard (is it 10646?) is 32-bit.
>
> 31-bit. But the codes above 0010FFFF will never be assigned.
Yes. I meant 32 bits can handle both Unicode and ISO. Plus you can use -1,
or other negative values for error codes.
> I think you are confusing wchar_t (a C standard) with TCHAR (a Microsoft
> idea). TCHAR is 16 bits in Unicode mode and 8 bits in "ANSI" (8-bit
> code page) mode.
Yes, I suppose I was.
> > But editors on both system can handle this minor quirk.
>
> Some editors. Try Notepad (the standard Windows plaintext editor),
> which can cope with UTF-16 fine but is baffled by bare-LF.
I was describing how I work. I was talking about editors I use, not trying
to imply all editors can handle everything. Are you sure Notepad on Win95
can cope with UTF-16? Perhaps on WinNT? Win95 pays only lip service to
Unicode.
For the record, under Windows I use the editor that comes with Visual C++.
Notepad is not exactly a programmer's editor. :-)
Adam
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:48 EDT