Re: Roman numerals

From: Kenneth Whistler (
Date: Wed Feb 17 2010 - 18:01:54 CST

  • Next message: Arno Schmitt: "Greek chars encoded twice -- why?"

    Michel Bottin responded:

    > Le 17/02/10 23:16, John H. Jenkins a écrit :
    > > The Roman numerals are there for the sake of compatibility with
    > > older standards only and their use should be avoided. It's better
    > > to simply build the Roman numerals you want to use out of the
    > > appropriate Latin letters.
    > >
    > But then we lack the numeric order.

    The numerical ordering of Roman numerals is not in the scope of
    the Unicode Standard, nor, for that matter, even in the scope of
    the Unicode Collation Algorithm.

    > For example for the numbers 1-24,
    > 30, 40, 50, 60, 70, 80, 90, 100 the collating sequence of the Latin
    > letters give:
    > C, I, II, III, IV, IX, L, LX, LXX, LXXX, M, V, VI, VII, VIII, X, XC, XI,
    > XXIV, XXX


    You get the same kind of mishmash for every acrophonic or other
    letter-based numerical system out there, including Greek letters
    used as numerals and Hebrew letters used as numerals.

    > and then for the kings of France, "Louis IX" (Saint Louis) precede
    > "Louis V" and an hypothetic "Louis XIX" would have preceded "Louis XIV"!

    No, the *string* "Louis IX" collates as less than the *string* "Louis V"
    (either in binary order or in UCA collation order), but then the
    *string* "Table 10" collates as less than the *string* "Table 5"
    (either in binary order or in UCA collation order), too. Such
    problems of ordering of numerals embedded in text are not addressed
    by a character encoding.

    > I understand the restriction of use for compatibility, but I think that
    > we really lack at least, the following figures necessaries to write
    > every roman numeral:
    > I, II, II, III, IV, V, VI, VII, VIII, IX, X, XL, L, XC, C, CM, M
    > encoded each as a unique character in a continuous sequence, with
    > corresponding numeric properties.

    Not only would this fail to address the full scope of the problem
    of numerals embedded in text (Roman or otherwise) -- it would just
    further complicate the problem of representation of Roman numerals
    in Unicode by putatively adding a *third* way to represent them.
    That is something further guaranteed to confuse people, rather than
    clarifying anything.

    If you want to make progress on handling numerals in text, the
    obvious alternative is to work with marked up text, instead, where
    numerals can be unambiguously tagged as to their scope and
    exact values.


    This archive was generated by hypermail 2.1.5 : Wed Feb 17 2010 - 18:03:34 CST