Re: Why is "endianness" relevant when storing data on disks but not when in memory?

From: Martin J. Dürst <duerst_at_it.aoyama.ac.jp>
Date: Sun, 06 Jan 2013 08:51:04 +0900

On 2013/01/06 7:21, Costello, Roger L. wrote:

> Does this mean that when exchanging Unicode data across the Internet the endianness is not relevant?
>
> Are these stated correctly:
>
> When Unicode data is in a file we would say, for example, "The file contains UTF-32BE data."
>
> When Unicode data is in memory we would say, "There is UTF-32 data in memory."
>
> When Unicode data is sent across the Internet we would say, "The UTF-32 data was sent across the Internet."

The first is correct. The second is correct. The third is wrong. The
Internet deals with data as a series of bytes, and by its nature has to
pass data between big-endian and little-endian machines. Therefore,
endianness is very important on the Internet. So you would say:

"The UTF-32BE data was sent across the Internet."

Actually, as far as I'm aware of, the labels UTF-16BE and UTF-16LE were
first defined in the IETF, see
http://tools.ietf.org/html/rfc2781#appendix-A.1.

Because of this, Internet protocols mostly prefer UTF-8 over UTF-16 (or
UTF-32), and actual data is also heavily UTF-8. So it would be better to
say:

When Unicode data is sent across the Internet we would say, "The UTF-8
data was sent across the Internet."

Regards, Martin.
Received on Sat Jan 05 2013 - 17:55:02 CST

This archive was generated by hypermail 2.2.0 : Sat Jan 05 2013 - 17:55:03 CST