RE: Translated IUC10 Web pages: Experimental Results

From: Martin J. Duerst (mduerst@ifi.unizh.ch)
Date: Wed Feb 05 1997 - 08:04:16 EST


On Tue, 4 Feb 1997 Chris Pratley wrote:

> A few comments on these html files and Word97's capabilities.
>
> Word97 supports UCS2 (little-endian) for textfiles
>
> Word97 will not open big-endian UCS2:
> http://194.75.134.50/unicode/iuc10/x-ucs2.html

Very interesting. I thought that the default for UCS2 was
big-endian. Even on little-endian machines, it would cost
almost nothing to use that default from the start. And if
there are really plans to make NT the main OS in the world,
I hope it is designed so that it doesn't depend on little-
endian hardware.

The minimum would be to support reading in big-endian
UCS2 as well as little-endian, if properly tagged, and
to always write out a tag. The current situation is
absolutely hilarious, and if not corrected immediately,
could cause very bad publicity against Unicode.

It is very sad to see that even though a problem has
been known since years, and appropriate specifications
and provisions have been included in the standard, a
company that has been strongly involved in creating that
standard and that has more resources than most others
is not able to do at least the minimum necessary to
let things work the way they were designed.

> Word97 supports UTF-8 for HTML (but not UCS2)
>
> This is why Word opens the true UTF-8 sites such as
> http://www.cm.spyglass.com/unicode/iuc10/x-utf8.html
> as Web pages, and the UCS2 little-endian pages as plain text.
>
> Our assumption was that UTF-8 was the only Web-safe encoding that was
> reasonably likely to be adopted by browsers in the near future. Is that
> the consensus, or are raw UCS2 encodings being considered actively by
> people on this alias?

HTTP, the main protocol used to serve Web documents, has absolutely
no problems transmitting UCS2 or any other kind of "binary" data.
On a modern server, the character encoding can easily be included
in the HTTP header. Also, a browser can easily use a simple heuristic
to distingush between Unicode (starting (hopefully!) with FEFF or
(hopefully not!) with FFFE)

Hope this helps. Regards, Martin.



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:33 EDT