Re: String name and Character Name

From: Arcane Jill (arcanejill@ramonsky.com)
Date: Mon Apr 25 2005 - 01:02:42 CST

  • Next message: David Starner: "Missing Phonetic Symbols (A-M)"

    -----Original Message-----
    From: unicode-bounce@unicode.org [mailto:unicode-bounce@unicode.org]On
    Behalf Of Asmus Freytag
    Sent: 23 April 2005 22:52
    To: Hans Aberg
    Cc: Unicode
    Subject: Re: String name and Character Name

    > At 02:40 PM 4/23/2005, Hans Aberg wrote:
    > >At 13:46 -0700 2005/04/23, Asmus Freytag wrote:
    > >>>So, say one wants to correct "BRAKCET" to "BRACKET", then the new
    > >>>version of UnicodeDATA.txt will look like:
    > >>> FE17;PRESENTATION FORM FOR VERTICAL LEFT WHITE LENTICULAR
    > >>> BRACKET;Ps;...
    > >>> FE18;PRESENTATION FORM FOR VERTICAL RIGHT WHITE LENTICULAR
    > >>> BRAKCET;Pe;...
    > >>> FE18;PRESENTATION FORM FOR VERTICAL RIGHT WHITE LENTICULAR
    > >>> BRACKET;Pe;...
    > >>> FE19;PRESENTATION FORM FOR VERTICAL HORIZONTAL ELLIPSIS;Po;...
    > >I leave it to the engineers to
    > >figure out what might be considered a less painful method.\
    >
    > --"Just leave the driving to us."

    Well, I'm a software engineer too, so I guess I'm allowed to comment here. This
    suggestion:

    > >>> FE18;PRESENTATION FORM FOR VERTICAL RIGHT WHITE LENTICULAR
    > >>> BRAKCET;Pe;...
    > >>> FE18;PRESENTATION FORM FOR VERTICAL RIGHT WHITE LENTICULAR
    > >>> BRACKET;Pe;...

    will break existing software. Or at least, it will break software which /I/
    have written, which is perhaps not so bad as mine is not commercially deployed,
    but I'm guessing there's commercially deployed software out there which made
    the same assumption as I - which is that UnicodeData.txt
    contains at most one line per codepoint, listed in ascending numerical order.
    If that assumption is invalid, my code breaks.
    Okay, so that's not much of a big deal as it wouldn't be /that/ much effort for
    me to write in a fix, but it might be more difficult for code which is already
    deployed.

    On the other hand...

    -----Original Message-----
    From: unicode-bounce@unicode.org [mailto:unicode-bounce@unicode.org]On
    Behalf Of Asmus Freytag
    Sent: 23 April 2005 21:43
    To: Peter Kirk; Doug Ewell
    Cc: Unicode Mailing List
    Subject: Re: String name and Character Name

    > But in the spirit of hypothesizing a solution, I would consider using an
    > alias mechanism in the way aliases are used for Property names the best
    > solution. For properties (and their values) there exist multiple aliases,
    > which are all considered unique.

    That would work, and wouldn't break anything.

    BUT... I still don't see the point. If the purpose of names is to be a unique
    identifier, then aliases are not needed. The existing names /already serve that
    purpose/. On the other hand, if the purpose of names is to be meaningful to
    humans (regardless of their language), then the CDLR suggestion still seems
    like the best idea to me.

    And though the names presented by BabelPad and its ilk may be sometimes
    misleading, it is difficult to criticise them too harshly while an alternative
    does not (yet) exist (although it would be nicer if TUS had gone to more effort
    to point out "these names are not supposed to be meaningful").

    My vote ... (if I had one) ... would go to the CDLR idea, indexed by codepoint
    (so we can ignore the names altogether). And when that's complete (at least for
    English), TUS should encourage applications to present CDLR-localized names to
    end-users in place of the ISO name. From a software point of view, that's kinda
    easy - if you've already got a locale discrimination mechanism in place, then
    it's just one more file to parse.

    Jill



    This archive was generated by hypermail 2.1.5 : Mon Apr 25 2005 - 01:05:18 CST