Re: UTF-8 vs UTF-16...? (Was: Feeling good about SML)

From: John Cowan (
Date: Mon Nov 22 1999 - 10:36:30 EST

MURATA Makoto wrote:

> I prefer UTF-16, since XML documents in legacy encodings never parse
> as UTF-16 and those in UTF-16 never parse as legacy encodings.

This seems confusing, especially with Unicode 3.0 where so much
of the BMP is now in use. Invalid UTF-8 is easy to spot, but
I think it would be easy to accept any non-ISO-2022 legacy
encoding (SJIS, e.g.) as UTF-16 and produce nonsense.

 As Tim knows very well, UTF-16 has a number of problems about byte
> ordering. On the other hand, UTF-8 it not free from such problems.
> UTF-8 from Microsoft appears to begin with the zero-width non-breaking
> space always ;-(

ISO 10646 actually blesses this, although Unicode does not.


John Cowan Schlingt dreifach einen Kreis vom dies! / Schliess eurer Aug vor heiliger Schau Den er genoss vom Honig-Tau / Und trank die Milch vom Paradies. -- Coleridge (tr. Politzer)

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:56 EDT