From: Mark E. Shoulson (mark@kli.org)
Date: Thu Jan 20 2005 - 08:24:15 CST
I've been slowly catching up on this thread. Isn't this just a case of
GIGO? The issue at hand is how to handle ill-formed "code-points" (i.e.
32-bit values) where a program was expecting to be dealing only with
Unicode values. Well, you've given it garbage in, it should be expected
to produce garbage out. If we choose to define the output garbage as
some twisted generalization of UTF-8 (so that it doesn't require any
special processing to generate), what's the problem? It isn't
representing invalid characters as valid ones (it's representing them as
invalid octet-sequences), and other applications are completely welcome
and invited to choke on them, as well they should, as well they would
have had they been given the bad data we got. If other applications
decide they know how to read our error-codes, more power to them. But
this isn't a matter of Unicode, or of redefining UTF-8. This is a
matter of what we do when it ISN'T Unicode.
~mark
Christopher Fynn wrote:
> Hans Aberg wrote:
>
> ...
>
>> My guess is that all that fits into a computer will be binary numbers
>> and
>> transformations thereof. If you know of a counter example, please let me
>> know. But the point of computers seems also to be that humans can
>> associate
>> these binary numbers with various human understandable structures. I
>> believe
>> the point of Unicode is that one associates characters with the Unicode
>> numbers. So a CPBTF-8 would be transformation where the code points
>> are not
>> thought to be associated with the Unicode characters, whereas I
>> believe the
>> point with Unicode is that one does associate Unicode numbers with
>> characters.
>>
>> Hans Aberg
>
>
> If that's so, discussion of this CPBTF-8 is out of scope on this list.
>
> - chris
>
This archive was generated by hypermail 2.1.5 : Thu Jan 20 2005 - 08:25:19 CST