Re: numeric properties of Nl characters in the UCD

From: Mark Davis (mark.davis@jtcsv.com)
Date: Wed Nov 26 2003 - 12:02:50 EST

  • Next message: Peter Kirk: "Re: numeric properties of Nl characters in the UCD"

    I agree that the numeric values should be set properly where they exist,
    following the precedent of other scripts. In practice, however, with non-decimal
    systems the programmer will need to know much more about how the numbering
    system works than just simply the numeric values, so the fact that we mention
    that Roman numeral C has the value 100 does not really play a significant role
    in someone's implementing Roman numerals correctly.

    There was a very useful book on numbering systems: "Histoire universelle des
    chiffres" by Georges Ifrah. And I just found out that there is a new revision
    out in English. ("Calendrical Calculations" also has a new edition out, BTW).

    Mark
    __________________________________
    http://www.macchiato.com
    ► शिष्यादिच्छेत्पराजयम् ◄

    ----- Original Message -----
    From: "Philippe Verdy" <verdy_p@wanadoo.fr>
    To: "Michael Everson" <everson@evertype.com>
    Cc: "Unicode@Unicode.Org" <unicode@unicode.org>
    Sent: Wed, 2003 Nov 26 06:40
    Subject: RE: numeric properties of Nl characters in the UCD

    > Michael Everson writes:
    > > >But why do U+10341 [GOTHIC LETTER NINETY] and U+1034A [GOTHIC LETTER NINE
    > > >HUNDRED], which are letters that are only ever used to represent the
    > > >numbers 90 and 900 respectively (they have no intrinsic phonetic
    > > >value), not have a numeric value assigned to them?
    > >
    > > Because there's no particular value in doing so.
    > >
    > > The burden is on you (or whomever) to prove that there would be.
    > > Otherwise, if it ain't broke, don't fix it.
    >
    > The cost of such exceptions is that an application cannot reliably use the
    > general categories to detect, evaluate or create numbers in a relevant
    > script. So this requires a separate table for each supported script.
    >
    > This unnecessarily complicates algorithms that support internationalized
    > numeric strings, in a area where it could be very simply fixed.
    >
    > We do need that characters that have a numeric property be defined either as
    > "Nd" (with three non-empty numeric properties values), or "Ni" (with two
    > non-empty numeric properties values), or "Nl" (with one non-empty numeric
    > properties values) or "No", i.e. "Number, Other" (with no non-empty numeric
    > properties), and that NO other category than "Mn" can have non-empty numeric
    > properties.
    >
    > > >BTW I've just noticed that U+10341 has a general category of
    > > "Lo" (Letter,
    > > >Other), whereas U+1034A has a general category of "Nl" (Number,
    > > Letter), which
    > > >seems a little odd.
    > >
    > > It does.
    >
    > And it is fixable...
    >
    >
    > __________________________________________________________________
    > << ella for Spam Control >> has removed Spam messages and set aside
    > Newsletters for me
    > You can use it too - and it's FREE! http://www.ellaforspam.com
    >



    This archive was generated by hypermail 2.1.5 : Wed Nov 26 2003 - 12:58:27 EST