From: Jukka K. Korpela (jkorpela@cs.tut.fi)
Date: Sun Nov 25 2007 - 06:11:29 CST
James Kass wrote:
> Based on the text quoted from the standard, it appears
> that CJK users are welcome to use Roman numerals, but
> Latin script users are discouraged from doing so.
More or less so. But discouraged, not forbidden.
> There are many things I like about the Unicode Standard,
> but assigning a character and then telling people they
> mustn't use it isn't one of them.
Well, it doesn't say "musn't". Rather "shouldn't", and even this
probably wouldn't express the tone quite adequately. And I'm sure you
know the reasoning behind this: Unicode was designed so that it can be
used as a universal code in the sense that character data in any
encoding can be mapped to Unicode and vice versa, without losing any
distinction that might be made in some other character code. (This does
not mean that conversions must preserve such distinctions as letter I
vs. Roman numeral I if made in the original data; just that conversions
_can_ be made that way.)
The main practical problem with this is that people who only look at the
Code Charts, paying attention to headings, glyphs, and names of
characters, easily get misled into using compatibility characters that
should normally not be used for new data. They may even think they are
progressive and logical. After all, if "I" is encoded as ROMAN NUMERAL
ONE, then grammar checkers, speech synthesizers, indexers, and other
software can know how to deal with, never taking it as a personal
pronoun for example. Besides, programs can select a specific glyph,
different from that of letter "I", for it. There's much sense in such
reasoning, except that it's not what the Unicode Standard suggests. And
this could be the start of a beautiful debate, except that it's too
late. We are supposed to use Latin letters as Roman numerals, and we are
supposed to use higher protocol levels, such as markup or commands, to
distinguish between different uses when needed.
I don't think the Unicode Standard can be interpreted as saying that you
should not ever use the compatibility characters denoting Roman numerals
in texts written in the Latin script. You might have reasons to do so,
perhaps just to be able to render them differently in a system that does
not support higher-level tools. But you should not expect that the
distinction will be preserved when the data is transferred to another
system. In practice, another system might not even recognize the Roman
numeral characters.
The Roman numerals, or at least most of them, were probably not related
to letters at all but special symbols, originating from counting marks,
and were later associated with letters. If there were ancient texts that
make a distinction between, say, letter V and the Roman numeral for
five, and if such texts needed to be encoded as plain text in digital
format to a considerable amount, then we would have a case for encoding
the numerals as independent, non-compatibility characters. In this
hypothetical situation, the would have to be added new characters, since
the current Roman numerals have equivalences that would not be
appropriate for such use.
On the other hand, anything less than such a situation will hardly cause
any serious reconsideration of the status of Roman numerals in Unicode.
Jukka K. Korpela ("Yucca")
http://www.cs.tut.fi/~jkorpela/
This archive was generated by hypermail 2.1.5 : Sun Nov 25 2007 - 06:14:19 CST