Re: Devanagari

From: DougEwell2@cs.com
Date: Sun Jan 20 2002 - 20:26:38 EST


In a message dated 2002-01-20 16:49:17 Pacific Standard Time,
creativezeal@hotmail.com writes:

> The point was that a UTF-8 encoded HTML file for an English web page
> carrying say 10 gifs would have a file size one-third that for a Devanagari
> web page with the same no. of gifs...
> Therefore transmission of a Devanagari web page over a network would take
> thrice as long as that of an English web page using the same images and
> presenting the same information.

This conclusion ignores two obvious points, which Asmus already made:

(1) The 10 GIFs, each of which may well be larger than the HTML file, take
the same amount of space regardless of the encoding of the HTML file. The
total number of bytes involved in transmitting a Web page includes
everything, HTML and graphics, but the purported "factor of 3" applies only
to the HTML.

(2) The markup in an HTML file, which comprises a significant portion of the
file, is all ASCII. So the "factor of 3" doesn't even apply to the entire
HTML file, only the plain-text content portion.

In addition, text written in Devanagari includes plenty of instances of
U+0020 SPACE, plus CR and/or LF, each of which which occupies one byte each
regardless of the encoding.

I think before worrying about the performance and storage effect on Web pages
due to UTF-8, it might help to do some profiling and see what the actual
impact is.

-Doug Ewell
 Fullerton, California



This archive was generated by hypermail 2.1.2 : Sun Jan 20 2002 - 19:56:45 EST