Re: Unicode in source

From: G. Adam Stanislav (adam@whizkidtech.net)
Date: Thu Jul 22 1999 - 18:36:24 EDT


On Thu, Jul 22, 1999 at 02:10:16PM -0700, John Cowan wrote:
> Either way, but with an appropriate BOM, and good software will be
> able to cope.

What's a BOM? I tried to look it up on AltaVista and got a lot of religious
references in what seemed like Portuguese ("Bom Jesus"). I suspect that is
not what you are talking about. :-)

> > Besides, UTF-16 can only contain the first plane.
>
> No, that's UCS-2 (which is moribund). UTF-16 handles planes 0-0x10,
> which is rather more than all the planes there will ever be.

Oh, sorry, I misunderstood. Then I would argue for UTF-8 even more since you
need to convert the data anyway... But UTF-8 can include legacy code without
modifications (assuming it is in 7-bit ASCII).

> > Even though, strictly
> > speaking, Unicode is 16-bit, the ISO standard (is it 10646?) is 32-bit.
>
> 31-bit. But the codes above 0010FFFF will never be assigned.

Yes. I meant 32 bits can handle both Unicode and ISO. Plus you can use -1,
or other negative values for error codes.

> I think you are confusing wchar_t (a C standard) with TCHAR (a Microsoft
> idea). TCHAR is 16 bits in Unicode mode and 8 bits in "ANSI" (8-bit
> code page) mode.

Yes, I suppose I was.

> > But editors on both system can handle this minor quirk.
>
> Some editors. Try Notepad (the standard Windows plaintext editor),
> which can cope with UTF-16 fine but is baffled by bare-LF.

I was describing how I work. I was talking about editors I use, not trying
to imply all editors can handle everything. Are you sure Notepad on Win95
can cope with UTF-16? Perhaps on WinNT? Win95 pays only lip service to
Unicode.

For the record, under Windows I use the editor that comes with Visual C++.
Notepad is not exactly a programmer's editor. :-)

Adam



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:48 EDT