RE: Where is the First> Last> convention documented?

From: Philippe Verdy (
Date: Wed Sep 12 2007 - 23:53:39 CDT

  • Next message: Mike: "Re: Where is the First> Last> convention documented?"

    Kenneth Whistler wrote:
    > And short identifiers don't follow the name syntax restrictions,
    > because they allow one character, "+", that is not allowed in character
    > names.
    Regarding my comment about missing names, I was not pretending that these
    complemented names should be defined the same way as other assigned names.

    But references to characters by name is better than reference by codepoint
    in many documents as it makes the reference clearer.
    Even Unicode needs to assign them names locally in many places to controls
    to make things clearer (look at the documents and standard annexes about the
    BiDi algorithm and line/word breaking.)

    Why I spoke about ISO 8859-1 and ISO 646 I spoke about their reference to
    the C0 and C1 subsets. But also about their definition in IANA charsets that
    DO include the C0 and C1 subsets, not just the G0 and G1 characters.
    (there's a difference between "ISO-8859-1", the IANA charset made of "ISO
    8859-1 for G0 plus C0 controls, and "ISO 8859-1"; notice the addition of the
    hyphen; the same is true between "ISO 646" and "ISO-646".)

    Even if there are non agreed names across several references about names
    assigned to C0 and C1 controls, at least one name should be specified
    consistently for use in Unicode/ISO 10646 contexts.

    When Ispoke about possible conflicts, its because applications frequently
    need to display names for controls. These names will preferably be those
    assigned by Unicode and ISO 10646 when thy exist, but if they are missing,
    the names will be inferred in some way, using the historic "na1" property,
    if available or some other legacy conventions, causing possible confusion if
    there's no agreed convention.

    Note that I know that not all C1controls have names, but the names are
    appearing in IBM references about EBCDIC, from where these controls were
    inherited and remapped into C1 controls. The names are used in transcoding
    tables (that have existed since long before Unicode/ISO 10646).

    I don't see why not assigning a name (possibly through a separate property)
    for these controls would be a problem for Unicode and iSO 10646 stability.
    But it's clear that these names do exist in many other references, notably
    within many RFCs and protocol specifications. You just need to choose a name
    that matches the most common usage (even if there are other inconsistent
    assignements in other references, which may be deprecated or never meant to
    be normative).

    This archive was generated by hypermail 2.1.5 : Wed Sep 12 2007 - 23:56:41 CDT