Re: 32'nd bit & UTF-8

From: Mark E. Shoulson (
Date: Thu Jan 20 2005 - 08:24:15 CST

  • Next message: Mark E. Shoulson: "Re: 32'nd bit & UTF-8"

    I've been slowly catching up on this thread. Isn't this just a case of
    GIGO? The issue at hand is how to handle ill-formed "code-points" (i.e.
    32-bit values) where a program was expecting to be dealing only with
    Unicode values. Well, you've given it garbage in, it should be expected
    to produce garbage out. If we choose to define the output garbage as
    some twisted generalization of UTF-8 (so that it doesn't require any
    special processing to generate), what's the problem? It isn't
    representing invalid characters as valid ones (it's representing them as
    invalid octet-sequences), and other applications are completely welcome
    and invited to choke on them, as well they should, as well they would
    have had they been given the bad data we got. If other applications
    decide they know how to read our error-codes, more power to them. But
    this isn't a matter of Unicode, or of redefining UTF-8. This is a
    matter of what we do when it ISN'T Unicode.


    Christopher Fynn wrote:

    > Hans Aberg wrote:
    > ...
    >> My guess is that all that fits into a computer will be binary numbers
    >> and
    >> transformations thereof. If you know of a counter example, please let me
    >> know. But the point of computers seems also to be that humans can
    >> associate
    >> these binary numbers with various human understandable structures. I
    >> believe
    >> the point of Unicode is that one associates characters with the Unicode
    >> numbers. So a CPBTF-8 would be transformation where the code points
    >> are not
    >> thought to be associated with the Unicode characters, whereas I
    >> believe the
    >> point with Unicode is that one does associate Unicode numbers with
    >> characters.
    >> Hans Aberg
    > If that's so, discussion of this CPBTF-8 is out of scope on this list.
    > - chris

    This archive was generated by hypermail 2.1.5 : Thu Jan 20 2005 - 08:25:19 CST