RE: Where is the First> Last> convention documented?

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Thu Sep 13 2007 - 15:34:52 CDT

  • Next message: Philippe Verdy: "RE: Where is the First> Last> convention documented?"

    Kenneth Whistler [mailto:kenw@sybase.com] wrote:
    > > Regarding my comment about missing names, I was not pretending that
    > these
    > > complemented names should be defined the same way as other assigned
    > names.
    >
    > I didn't assume that you were *pretending* that to be the case;
    > I observed that you were *asserting* it to be the case.
    >
    > > But references to characters by name is better than reference by
    > codepoint
    > > in many documents as it makes the reference clearer.
    >
    > Ah, now you change your tune. I have no quarrel with that claim. Certainly
    > being able to refer to common use control codes by names such
    > as "tab" and "carriage return" instead of hexadecimal U+0009 and
    > U+000D makes the intent clearer to everyone -- even those of us
    > who spend much of our day thinking in hexadecimal.
    >
    > But in your prior contribution, you were talking about alleged
    > problems of stability of applications because of characters which
    > currently have no normatively defined character name attribute.

    I have not changed my tune nor even my intimate intuition if what Isaidwas
    not clear and could be interpreted differently.

    The need for stable names for C0 andC1 controls remains, and when I speak
    about stability, it's not within the Unicode standard itself (because such
    names are still not present), but within applications or documents needing
    names to reference them in a more clear way than just U+00xx (which is not
    ambiguous but not clear enough, for readers, given that even Unicode needs
    to define "aliases" to reference them in many places in its annexes.

    So your attempt to say that the proposed names using "<>" or "# within names
    were non conforming are not relevant. What application need are stable names
    even if those names come from another character property which does not
    respect the current rules for existing standard character names. After all,
    Unicode references the "na1" property (see the XML proposed format for the
    UCD),andcould as well have another property if it does not want to change
    the value of existing properties. And we have lots of other properties for
    CJK ideographs.

    Most commonly used names are those based on 2/3 character abbreviations, so
    these "aliases" are still the best: "NUL, ..., TAB, LF, VT, FF, CR, ... DC1,
    ..., CSI, ...".

    I won't take the 2-characters Keld's mnemonic as they are broken even if
    they remain in old charset definition RFCs: these have been deprecated since
    long by using charset tables based on Unicode/ISO 10646 code points as the
    central encoding, and by the mappings published in Unicode (even if they are
    informative, they have equivalent content and these tables are now used in
    many systems, possibly compiled in some proprietary binary format).

    But at least, these names would simplify the writing of new specifications,
    or could help disambiguate some old RFCs by making them more precise if some
    normative reference was simply available to specify this without long lists
    of local definitions in each document needing them (including in the Unicode
    standard annexes where these names are needed and redefined locally).



    This archive was generated by hypermail 2.1.5 : Thu Sep 13 2007 - 15:39:37 CDT