> -----Original Message-----
> From: Marco Mussini [mailto:marco.mussini@vim.tlt.alcatel.it]
> In an application that runs mostly in
> Chinese/Japanese, this means wasting space.
Not as much as you'd think for modern non-text processing focused
applications. A typical localized application in Japanese has a great deal
of ASCII based strings in it. You'd only start to pay a real penalty for
storing Japanese in UTF-8 when you're dealing with processing large amounts
of pure plain text. (word processing a book, etc.)
> Unfortunately most OSs that do offer today support for some flavor of
> Unicode in their API offer it in UTF-8 and not UCS16.
Amongst the Un*xes, yes. But the Win32 API is using 16-bit values, not
UTF-8. The CE version is Unicode only. NT version is Unicode or ANSI [sic].
95/98 is stripped down, but the functions that do support Unicode do it in
16-bit.
COM calls (except DAO)-- which is the foundation of ActiveX, are also done
with 16-bit wide Unicode.
Most older OSes (and RDBMS) offer Unicode through UTF-8 because they don't
have a easy way to convert there legacy 8-bit character interfaces to 16-bit
without breaking compatibility.
BeOS, a newer OS, uses UTF-8 internally though-- the developer's guide
claims that this is for compatibility (with ASCII) and space saving reasons.
> Java internally works in UCS16 (ideally) but it is likely to communicate
> to the external world in UTF-8.
Class files store strings in quasi-UTF-8 format. Interestingly, NTFS stores
in 16-bit format. Long-file name FAT and FAT32 are designed to handle 16-bit
Unicode, but it was never implemented in 95.
> Is it reasonable to use UCS16 for external communications?
So long as you know the medium will pass all 8-bit values untouched and both
ends know what byte-order to expect, sure.
If the U+FEFF BOM is at the beginning of the file, both Communicator and
Explorer will autodetect it. XML processors also understand this and will
auto-switch to UCS-2.
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:41 EDT