Re: ASCII and Unicode lifespan

From: Peter Kirk ([email protected])
Date: Fri May 20 2005 - 06:41:35 CDT

Next message: Peter Kirk: "Re: ASCII and Unicode lifespan"

Previous message: Radovan Garabik: "Re: ASCII and Unicode lifespan"
In reply to: Dean Snyder: "Re: ASCII and Unicode lifespan"
Reply: Doug Ewell: "Re: ASCII and Unicode lifespan"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

On 20/05/2005 02:36, Dean Snyder wrote:

> ...
>
>I can, for example, see a future when 32 bit characters are the minimum
>standard and all hardware dealing with text has the same endianness -
>the current default, big endian ;-) In such environments, multiple text
>encoding forms and schemes and BOMs will be superfluous.
>
>
>
In an environment in which all text is represented by 32-bit entities,
endianness is also superfluous, or meaningless, and the fighters of
Lilliput can lay down their weapons at last.

...

>>>Probably the single most important, and extremely simple, step to a
>>>better encoding would be to force all encoded characters to be 4 bytes.
>>>
>>>
>>Naive in the extreme. You do realize, of course, that the entire
>>structure of the internet depends on protocols that manipulate
>>8-bit characters, with mandated direction to standardize their
>>Unicode support on UTF-8?
>>
>>

Actually, much of the Internet infrastructure can still deal only with
7-bit characters, as we have been discussing on another thread. In order
to carry 8-bit data, whether legacy encoded or UTF-8, across the
Internet, it is apparently necessary to insert a low level "Quoted
Printable" encoding layer to recode any bytes with the top bit set as
three characters, leading to gross inefficiency in transmission of
anything other than ASCII text - any UTF-8 encoded Unicode character
beyond U+0080 is transmitted as between six and twelve bytes in this
encoding. If we can tolerate this kind of extra layer to carry 8-bit
character based data on a 7-bit medium, surely we can tolerate a similar
layer to carry 32-bit character data on a 7-bit or 8-bit medium, for a
transitional period until the Internet or its successor is upgraded to
support 32-bit data at its lowest levels. And it should be possible to
devise a suitably efficient encoding which is a lot less inefficient
than UTF-8 over "Quoted Printable". Well, of course UTF-7 and UTF-8 are
suitable encodings, but I am understanding them here as being used as
content transfer encodings rather than as character sets.

-- 
Peter Kirk
[email protected] (personal)
[email protected] (work)
http://www.qaya.org/
-- 
No virus found in this outgoing message.
Checked by AVG Anti-Virus.
Version: 7.0.322 / Virus Database: 266.11.13 - Release Date: 19/05/2005

Next message: Peter Kirk: "Re: ASCII and Unicode lifespan"
Previous message: Radovan Garabik: "Re: ASCII and Unicode lifespan"
In reply to: Dean Snyder: "Re: ASCII and Unicode lifespan"
Reply: Doug Ewell: "Re: ASCII and Unicode lifespan"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Fri May 20 2005 - 06:42:29 CDT