Re: My Querry

From: Asmus Freytag (
Date: Wed Nov 24 2004 - 16:16:41 CST

  • Next message: John Hudson: "Re: No Invisible Character - NBSP at the start of a word"

    At 04:23 PM 11/23/2004, Chris Jacobs wrote:
    >Now, this implies that UTF-8 does interpret U+0000 as an ASCII NULL
    >control char.
    >This is incompatible with using it as a string terminator.

    Except that it's up to you how to interpret the C0 control codes in Unicode.
    You can do it according to ISO 6429 (where the NUL has a specific significance)
    or you can do it according to any other scheme.

    The fact is, that that ambiguity of interpreting control codes is very ancient
    practice: device control data streams treat these codes differently than they
    are treated in plain text files (or in programming language literal data).

    Unicode simply recognized this and did not wish to enforce a particular usage.
    However, lately, efforts have been under way to make the default behavior of
    several of the C0 controls formally match their customary semantics in plain
    text data.

    I'm not seeing a lot in this thread that adds to the store of knowledge on this
    issue, but I see a number of statements that are easily misconstrued or
    including the thoroughly discredited practice of storing information in the
    bit, when piping seven-bit data through eight-bit pathways. The problem
    with that
    approach, of course, is that the assumption that there were never going to be
    8-bit data in these same pipes proved fatally wrong.


    This archive was generated by hypermail 2.1.5 : Wed Nov 24 2004 - 16:18:40 CST