Re: Do non-positional number systems present security issues?

From: Mark Davis ☕ (mark@macchiato.com)
Date: Fri Apr 16 2010 - 12:41:55 CDT

  • Next message: Janusz S. Bie: "cedille in Polish (was: preparing a PUA specification (for historical Polish text))"

    Yes, the characters with Nd are by design those that can be used with
    "normal" big-endian positional decimal syntax. whereby a sequence of such
    digits {N0, N1, N2, ...Nn} has the numeric value (...(N0 * 10 + N1) * 10 +
    N2) * 10 ... + Nn)

    Numeric characters that are peculiar in some fashion, and cannot simply be
    interpreted in the above fashion, are marked as numbers, but not Nd. Here is
    the list, grouped by General Category:

    http://unicode.org/cldr/utility/list-unicodeset.jsp?a=\p{n}&g=gc

    and by General Category then Numeric Value:

    http://unicode.org/cldr/utility/list-unicodeset.jsp?a=\p{n}&g=gc+nv

    Note that for security, implementations may want to put a further
    restriction on sequences of Nd, so as to not mix scripts. Other measures may
    also be needed, to detect problems like ৪৭, which looks like 89 (if you have
    the font), but is actually 47 written in Bengali digits. For more info, see
    UTS #36 (see proposed version at
    http://www.unicode.org/reports/tr36/proposed.html).

    If there are any discrepancies in the properties of the above characters,
    those can be brought to the attention of the UTC using the reporting form
    (the next meeting is in May).

    Mark

    — Il meglio è l’inimico del bene —

    On Fri, Apr 16, 2010 at 09:14, karl williamson <public@khwilliamson.com>wrote:

    > Thanks for your response.
    >
    >
    > Shriramana Sharma wrote:
    >
    >> On 2010-Apr-12 22:39, karl williamson wrote:
    >>
    >>> Can anyone tell me: Are there other scripts where Gc=Nd characters can
    >>> behave with other than the positional meanings of the digits 0-9? The
    >>> only technical note that has "number" in the title is the one that
    >>> Shriramana mentioned, so I'm assuming not.
    >>>
    >>
    >> How about Telugu? IIRC the original proposal for the Telugu fractions
    >> submitted by Nagarjuna Venna has examples for the Telugu digits being used
    >> as modifiers for the fractions or something.
    >>
    >
    > I looked this up, and found a paper by N. Venna, and it looks like what
    > Unicode adopted was things like U+0C78: TELUGU FRACTION DIGIT ZERO FOR ODD
    > POWERS OF FOUR. But their category is No, not Nd.
    >
    >
    >
    >> And for Devanagari? The above same for the "generic North Indic fractions"
    >> proposed by Anshuman Pandey.
    >>
    >
    > I looked this up as well, and these fractions, eg. U+A830: NORTH INDIC
    > FRACTION ONE QUARTER also have general category No.
    >
    > Apparently there are no other cases of non-positional notation digits
    > having general category=Nd.
    >
    >
    >
    >



    This archive was generated by hypermail 2.1.5 : Fri Apr 16 2010 - 12:46:04 CDT