Re: String name and Character Name

From: Hans Aberg (haberg@math.su.se)
Date: Sat Apr 23 2005 - 15:40:57 CST

  • Next message: Hans Aberg: "Re: String name and Character Name"

    At 13:46 -0700 2005/04/23, Asmus Freytag wrote:
    >>So, say one wants to correct "BRAKCET" to "BRACKET", then the new
    >>version of UnicodeDATA.txt will look like:
    >> FE17;PRESENTATION FORM FOR VERTICAL LEFT WHITE LENTICULAR BRACKET;Ps;...
    >> FE18;PRESENTATION FORM FOR VERTICAL RIGHT WHITE LENTICULAR BRAKCET;Pe;...
    >> FE18;PRESENTATION FORM FOR VERTICAL RIGHT WHITE LENTICULAR BRACKET;Pe;...
    >> FE19;PRESENTATION FORM FOR VERTICAL HORIZONTAL ELLIPSIS;Po;...
    >>If somebody refers to U-FE18 as "PRESENTATION FORM FOR VERTICAL
    >>RIGHT WHITE LENTICULAR BRAKCET", it will be recognized, but if the
    >>character is first translated into its code point, and then back to
    >>a character name, one gets back "PRESENTATION FORM FOR VERTICAL
    >>RIGHT WHITE LENTICULAR BRACKET". Of course, this last name will be
    >>recognized as well.
    >
    >This particular implementation of your idea would create a huge
    >mess, as many, many tools expect a single line for each character in
    >UnicodeData.txt. Your approach would break all these tools.
    >
    >But you are not alone in having considered aliases. The UTC has
    >developed a very stable system of naming properties, where aliases
    >have been used to correct typos and other issues.
    >
    >See my other post to this list from a few minutes ago.

    Clearly, if Unicode characters can have more than one name, changing
    from a situation where they formerly only had one name, computer
    software must be rewritten to accommodate for that. I leave it to the
    engineers to figure out what might be considered a less painful
    method.

    -- 
       Hans Aberg
    


    This archive was generated by hypermail 2.1.5 : Sat Apr 23 2005 - 15:41:57 CST