Re: Translated IUC10 Web pages: Experimental Results

From: Martin J. Duerst (mduerst@ifi.unizh.ch)
Date: Mon Feb 10 1997 - 09:31:04 EST


On Mon, 10 Feb 1997 unicode@Unicode.ORG wrote:

> Asmus Freytag wrote:
>
> > ... As a consequence, Notepad does not read
> > Unicode files w/o a BOM (even little endian ones)...
>
> To my surprise a week or two ago I discovered that it does
> recognise Unicode files without the BOM. I think it looks at
> the line endings. This does not detract from the argument.

Does this include correct (i.e. BIG-endian) files? That would
definitely be some good news.

> And, while I am writing, here is another point:
>
> > On the web a higher level of precision is needed. Specifying protocols
> > in terms of serialized byte streams and therefore requiring MSB
> > canonical byte ordering increases data security and in these situtaions,
> > the overhead of transposition (in the worst case twice) is fully acceptable.
> > It's hard to find any arguments there.
>
> It is a pity that this complicates the procedure for web publishing,
> which is usually:
> - Edit the file on your own computer
> - Put it in the right directory on the server (via a remotely
> mounted disc, or FTP usually)
> If you want to send big-endian UCS2 in the HTTP stream, you need
> a text editor that can save text as bytes in big-endian UCS2, or arrange
> for a conversion at publishing time, or by the web server.
> We need a balance between efficiency (number of bytes for East Asian
> characters) and user perceptions of complication (why are there four
> formats for Unicode web pages (big-endian, little-endian, UTF-8, &#nnnn;)?!).

I think it is pretty clear that any OS and any application that
wants to claim that it cares about interoperability on the
Internet (and who would not, nowadays), has to store UCS2
in plain text (which includes plain-text-based formats such
as HTML) in BIG-endian order (unless the file system has a
device that remembers whether a 16-bit write or a 8-bit write
was used to produce the file; not even old record-oriented
file systems had such a device, as far as I know).
When in doubt between efficiency and user convenience, choose
user convenience. Transposition is extremely fast. Any hard-
disk is way slower.

Regards, Martin.



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:34 EDT