Re: Roman numerals

From: Michel Bottin (michel.bottin@free.fr)
Date: Sun Feb 21 2010 - 09:56:13 CST

  • Next message: Vinodh Rajan: "Unicode Standard for Oriya Ya-Phala"

    >
    > My feeling is that Roman numerals are only written with single letter digits, and that all other numerals are just
    > precomposed for compatibility reason (notably JIS and other CJK encodings where they are encoded to be rendered in a
    > single square).
    >
    > Except for use within CJK for rendering within the square of a CJK font, Roman numerals should not be precomposed.
    > There will still be a compatibility mapping from these Roman numeral letters to Latin letters (they are anyway the
    > same since the origin, and deviated only because of a prefered rendering for them in modern texts, where they should
    > preferably not be cursive, and drawn with serifs even if the text is written with a sans-serif font), and some
    > people now want to make them distinct from normal text (which are anyway not written in the Latin language).
    >
    > But if you look further in the past, you'll find many examples of Roman numerals written with a cursive script, and
    > all the past script designs (including Fraktur, Caroline..), as well as in lowercase, possibly in italic and bold.
    > The distinction with normal text was made by surrounding Roman numbers with an additional middle dot.
    >
    > You'll also note that Roman numerals also exist in small capitals (ordinal century numbers and subchapter numbers,
    > but not millenium numbers and book numbers and main chapter numbers...). We can then go far into typographic
    > conventions, which are not universal and language-dependant today.
    >
    This show clearly that these "letters" are specific. We should consider
    the Roman numerals not as plain letters but as "Letter like symbols".
    Tons of such symbols are encoded in Unicode for mathematical use. The
    paradox is that Etruscans numbers (borrowed by the Romans with their
    alphabet) are encoded in Unicode from U+10320 to U+10323: 𐌠, 𐌡, 𐌢, 𐌣
    with General Category No, Other_Number and Numeric values 1, 5, 10, 50!
    Why not the Roman ones, still used even in a restricted way.
    > I really think that Roman numbers encoded with multiple letters should be avoided as much as possible, unless you
    > are sure that you will never exceed twelve and you want to get a consistant look for the whole sequence up to that
    > number (e.g. hours on a clock cycle, or month numbers). Avoid them for numbering king names, centuries, years.
    >
    Probable that the limit to XII for the "precomposed" characters in GB
    character sets is due to hourly use.
    > So, use the the single-letter Roman numerals and compose all numbers with them and yoiu won't get any more problem.
    > Or use the normal Latin letters, and the rendering font and style of your choice, as well as the optional
    > surrounding punctuation you may want.
    >
    > Collation is a separate issue : when this requires reordering letters or reinterpreting the whole number, this goes
    > out of scope of the Unicode standard, and even from the UCA algorithm itself, because this is exactly the same
    > problem for all numeral systems that are enough complete to require using a variable number of digits.
    I am not fully convinced by this argument. The Roman numeral system is
    not directly positional and thus avoid the side effect of variable
    number of digits.

    In fact, if we limit us to the additive notation, the only seven
    necessary symbols are I, V, X, L, C, D, M. If we consider the more
    common subtractive notation, the number of symbols is only 12: I, IV, V,
    IX, XL, L, XC, C, CD, D, CM, M. Such a sequence encoded with subsequent
    coded value are enough to ensure a numerical ordering with a simple
    lexicographic order based on the code points. IMHO it's a pity that
    Unicode don't take advantage of its power to encode cleanly this
    historically important system, still used nowadays even as a residual one.
    > This is a
    > matter of semantic analysis of the decoded streams of characters and it highly depends on the language and on the
    > conventional notation used, both of them being not encoded directly in plain texts.
    >
    > Philippe.
    >



    This archive was generated by hypermail 2.1.5 : Sun Feb 21 2010 - 10:01:25 CST