Re: 32'nd bit & UTF-8

From: Antoine Leca (Antoine10646@leca-marti.org)
Date: Tue Jan 18 2005 - 06:00:33 CST

Next message: Jon Hanna: "RE: Subject: Re: 32'nd bit & UTF-8"

Previous message: Arcane Jill: "Subject: Re: 32'nd bit & UTF-8"
In reply to: Hans Aberg: "32'nd bit & UTF-8"
Next in thread: Hans Aberg: "Re: 32'nd bit & UTF-8"
Reply: Hans Aberg: "Re: 32'nd bit & UTF-8"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

On Monday, January 17th, 2005 18:06Z Hans Aberg va escriure:

> Are there any good reasons for UTF-[8] to exclude the 32'nd bit of
> an encoded 4-byte?

The ISO/IEC 10646 framework.

> I.e, the 6-byte combinations
> 111111xx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx
> where the first x = 1.

> With a full 32-bit encoding, one can also use UTF-8 to encoding
> binary data.

Why?
Look, I have two computers.
One runs generally DOS softwares programmed with TurboPascal, and dealing
with 32 unsigned datas is a nightmare (no built-in data type), I have to go
back to assembly for every operations on such "binary datas", or else using
the 64 signed data type using the FPU, but with a noticeable performance
hit. I very much prefer having "16-bit binary datas" with it ;-).
Of course, in the real world I am using streams (including counted strings)
of 8-bit datas, like anybody.

The other has a 64-bit based architecture. I have difficulties to match your
proposition (about "full") above about it. In fact, I am already entangled
with softwares that was designed as "Unified architecture" and only
forecasted the use of 32-bit integers and pointers.
So I beg your pardon, but I feel a bit angry about your proposal.

> It also simplifies somewhat the implementation of
> Unicode in lexer generators (such as Flex): The leading byte then
> covers all 256 combinations. All 2^32 numbers should probably be
> there for generating proper lexer error messages.

Not sure if I understand you correctly. What about 00 vs. C0.80, E0.80.80,
FE.80.80.80.80.80.80 etc.?

Antoine

Next message: Jon Hanna: "RE: Subject: Re: 32'nd bit & UTF-8"
Previous message: Arcane Jill: "Subject: Re: 32'nd bit & UTF-8"
In reply to: Hans Aberg: "32'nd bit & UTF-8"
Next in thread: Hans Aberg: "Re: 32'nd bit & UTF-8"
Reply: Hans Aberg: "Re: 32'nd bit & UTF-8"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Tue Jan 18 2005 - 06:05:18 CST