RE: Where is the First> Last> convention documented?

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Wed Sep 12 2007 - 06:29:37 CDT

  • Next message: Michael Maxwell: "RE: [indic] Re: Feedback on PR-104"

    Kenneth Whistler wrote:
    > Note, however, as regards names in particular, that some
    > Unicode characters (e.g., noncharacters, private-use characters) don't
    > have character names, ...)

    I won't discuss the case of CJK and Hangul ranges, because they do have
    complete properties including standard names.
    But I still don't understand why the assigned controls and PUAs don't have
    at least one default character name, at least computed algorithmically (like
    Hangul and CJK ideographs).

    For the stability of applications using these characters, it seems that
    these controls and PUAs should still have a standard name (may be this name
    is "U+xxx"...) to avoiud any possible future conflicts with other characters
    that will get their own standard names, if the application needs to define a
    name property for these characters instead of retuning a non unique empty
    name or raising an exception (as if the characters were unassigned).
    The most obvious missing names that we frequently encounter in texts encoded
    with valid UTF are with controls.

    Why Unicode still does not endorse the existing ISO 646 and ISO 8859 names
    for these C0 and C1 controls? Why would it be a problem to assign such name
    (a name is just a name, not a description of its semantic or intended use in
    applications).

    So:
    * instead of having just "<control>" for U+001B, why not having "<control>
    ESC" for the ASCII escape character (even if we know that some encodings
    will not treat it as a distinct separate character but will use it as part
    of the encoding scheme, which is NOT a standard UTF anyway)?
    * instead of having just "<private use>" for U+E000, why not having
    "<private use> E000" computed algorithmically for the standard name?

    As an alternative, you could say that some applications could generate the
    comment field or use it algorithmically, so that the strict compatibility
    will be preserved for the existing name field. This would give the extended
    names (respectively for the examples above):
    * "<control> #ESC"
    * "<private use> #E000"

    I don't see which other standard it will break.



    This archive was generated by hypermail 2.1.5 : Wed Sep 12 2007 - 06:31:43 CDT