Re: Roman numerals (was Re: How to write Armenian ligatures?)

From: Jukka K. Korpela (jkorpela@cs.tut.fi)
Date: Sun Nov 25 2007 - 06:11:29 CST

  • Next message: Adam Twardoch: "Re: How to write Armenian ligatures?"

    James Kass wrote:

    > Based on the text quoted from the standard, it appears
    > that CJK users are welcome to use Roman numerals, but
    > Latin script users are discouraged from doing so.

    More or less so. But discouraged, not forbidden.

    > There are many things I like about the Unicode Standard,
    > but assigning a character and then telling people they
    > mustn't use it isn't one of them.

    Well, it doesn't say "musn't". Rather "shouldn't", and even this
    probably wouldn't express the tone quite adequately. And I'm sure you
    know the reasoning behind this: Unicode was designed so that it can be
    used as a universal code in the sense that character data in any
    encoding can be mapped to Unicode and vice versa, without losing any
    distinction that might be made in some other character code. (This does
    not mean that conversions must preserve such distinctions as letter I
    vs. Roman numeral I if made in the original data; just that conversions
    _can_ be made that way.)

    The main practical problem with this is that people who only look at the
    Code Charts, paying attention to headings, glyphs, and names of
    characters, easily get misled into using compatibility characters that
    should normally not be used for new data. They may even think they are
    progressive and logical. After all, if "I" is encoded as ROMAN NUMERAL
    ONE, then grammar checkers, speech synthesizers, indexers, and other
    software can know how to deal with, never taking it as a personal
    pronoun for example. Besides, programs can select a specific glyph,
    different from that of letter "I", for it. There's much sense in such
    reasoning, except that it's not what the Unicode Standard suggests. And
    this could be the start of a beautiful debate, except that it's too
    late. We are supposed to use Latin letters as Roman numerals, and we are
    supposed to use higher protocol levels, such as markup or commands, to
    distinguish between different uses when needed.

    I don't think the Unicode Standard can be interpreted as saying that you
    should not ever use the compatibility characters denoting Roman numerals
    in texts written in the Latin script. You might have reasons to do so,
    perhaps just to be able to render them differently in a system that does
    not support higher-level tools. But you should not expect that the
    distinction will be preserved when the data is transferred to another
    system. In practice, another system might not even recognize the Roman
    numeral characters.

    The Roman numerals, or at least most of them, were probably not related
    to letters at all but special symbols, originating from counting marks,
    and were later associated with letters. If there were ancient texts that
    make a distinction between, say, letter V and the Roman numeral for
    five, and if such texts needed to be encoded as plain text in digital
    format to a considerable amount, then we would have a case for encoding
    the numerals as independent, non-compatibility characters. In this
    hypothetical situation, the would have to be added new characters, since
    the current Roman numerals have equivalences that would not be
    appropriate for such use.

    On the other hand, anything less than such a situation will hardly cause
    any serious reconsideration of the status of Roman numerals in Unicode.

    Jukka K. Korpela ("Yucca")
    http://www.cs.tut.fi/~jkorpela/



    This archive was generated by hypermail 2.1.5 : Sun Nov 25 2007 - 06:14:19 CST