Re: Misuse of 8th bit [Was: My Querry]

From: Antoine Leca (
Date: Fri Nov 26 2004 - 05:50:50 CST

  • Next message: Johannes Bergerhausen: "Hanunoo, Tagbanwa"

    On Thursday, November 25th, 2004 08:05Z Philippe Verdy va escriure:
    > In ASCII, or in all other ISO 646 charsets, code positions are ALL in
    > the range 0 to 127. Nothing is defined outside of this range, exactly
    > like Unicode does not define or mandate anything for code points
    > larger than 0x10FFFF, should they be stored or handled in memory with
    > 21-, 24-, 32-, or 64-bit code units, more or less packed according to
    > architecture or network framing constraints.
    > So the question of whever an application can or cannot use the extra
    > bits is left to the application, and this has no influence on the
    > standard charset encoding or on the encoding of Unicode itself.

    What you seem to miss here is that given computers are nowadays based on
    8-bit units, there have been a strong move in the '80s and the '90s to
    _reserve_ ALL the 8 bits of the octet for characters. And what was asking A.
    Freitag was precisely to avoid bringing different ideas about possibilities
    to encode other class of informations inside the 8th bit of a ASCII-based
    storage of a character.

    In a similar vein, I cannot be in agreement that it could be advisable to
    use the 22th, 23th, 32th, 63th, etc., the upper bits of the storage of a
    Unicode codepoint. Right now, nobody is seeing any use for them as part of
    characters, but history should have learned us we should prevent this kind
    of optimisations to occur. Particularly when it is NOT defined by the
    standards: such a situation leads everybody and his dog to find his
    particular "optimum" use for these "free space", and these classes of
    optimums do not generally collides between them...


    This archive was generated by hypermail 2.1.5 : Fri Nov 26 2004 - 05:59:37 CST