# Re: Roman Numerals (was Re: Improper grounds for rejection of proposal N2677)

From: Kenneth Whistler (kenw@sybase.com)
Date: Fri Oct 28 2005 - 12:43:17 CST

• Next message: Bashar: "Re: rejection of proposal N2677 (was improper grounds for ...)"

Jukka continued, argumentatively:

> > 1) The dedicated Roman numerals only go up to twelve, with spotty support
> > beyond that.
>
> There are a few more, and you could always represent e.g. the numeral for
> thirteen as U+2169 U+2162. So although the argument indirectly refers to
> the fact that the numerals were included for rather specific purposes,
> it's not logically convincing.

Not logically convincing? How about the fact that any objective
analysis of the Roman numeral system would include that the relevant
units were the Latin letters, rather than the first twelve complete
numerals + L, C, D, M?

How about the fact that if you presume that the compatibility number
forms are supposed to be used for the representation of Roman numerals
in general, your example for 13 would be multiply ambiguous
in representation:

XIII = <2169, 2162> (<X, III>)
XIII = <2169, 2161, 2160> (<X, II, I>)
XIII = <2169, 2160, 2160, 2160> (<X, I, I, I>)
XIII = <216A, 2161> (<XI, II>)
XIII = <216A, 2160, 2160> (<XI, I, I>)
XIII = <216B, 2160> (<XII, I>)

Once you start down that road, you haven't got a logical leg to stand
on to distinguish between your cases. You'd be arguing for an incoherent
system that *badly* represents Roman numerals.

>
> > 2) Without fancy keyboard pyrotechnics, the dedicated Roman numerals would be
> > typed and deleted one at a time. E.g., if a user accidentally entered "VIII"
> > and realized that there was one "I" too many, a backspace would delete the
> > whole thing instead of just the final "I". This is not likely the behavior
> > the user expects.
>
> Probably so, but would this be a problem to users who consciously choose
> to use those numerals?

Yes it would be a problem. See the above problem with XIII, for example.

> After all, we are not discussing the question whether
> everyone should use them but whether some people could use them. Besides,
> it wouldn't be the end of the world as we know it if a delete function
> behaved in a somewhat unexpected way in such a case.

Nor would it be the end of the world if people didn't do stupid things
with compatibility characters just because they can and then say, "Wow!
I just poked myself in the eye with my finger and it hurts."

> > 3) Having said that, the dedicated Roman numerals would still be appropriate
> > to use in some limited contexts. E.g., if you're laying out Asian text
> > vertically and want the Roman numerals "I" through "XII" to be interspersed
> > *horizontally* in the text.
>
> This is an important practical point. The intended use for the dedicated
> Roman numerals is in such contexts,

It is. These are a bunch more examples of East Asian typographic bullet
characters -- cf. all the stuff in the Enclosed Alphanumerics block,
U+2460..U+24FF. *NONE* of that stuff is intended to be misused as generic
digits for numerical representations nor, for that matter, as generic
letters for textual representations.

> and this implies that their glyphs
> should be expected to reflect that, i.e. to be suitable for use in
> vertical text. Therefore they cannot be typographically very suitable
> for "normal", horizontal text.
>
> There is another argument, of much more general nature. The Unicode
> standard says, in clause 3.7:
>
> "Compatibility decomposable characters ... support transmission and
> processing of legacy data. Their use is discouraged other than for legacy
> data or other special circumstances."
>
> (The Roman numeral characters that we are discussing are compatibility
> decomposable to sequences of Latin letters.)

Correct.

>
> This all doesn't mean that there would be no _need_ for the dedicated
> Roman numerals. It just means that the arguments against using them,
> outside the specific scope of use, are probably stronger than arguments
> in favor of them. For example, it _would_ be useful if a speech
> synthesizer could read "Charles I" as "Charles the first" rather than
> "Charles eye", and if the "I" were written as a dedicated Roman numeral,
> the software could know that it is unambiguously a number, not e.g. the
> personal pronoun "I". But this won't happen, for a multitude of reasons.
> Such things need to be handled at other protocol levels, such as markup
> (even though there's no useful general-purpose markup for such things at
> present).

Correct. Even though some people can and *do* demonstrate that there
are reasons for distinguishing Roman numerals from non-numeric use
of Latin letters, that doesn't amount to an argument that they should
be separately encoded *as* characters.

--Ken

This archive was generated by hypermail 2.1.5 : Fri Oct 28 2005 - 12:44:06 CST