# Re: Roman Numerals (was Re: Improper grounds for rejection of proposal N2677)

Date: Fri Oct 28 2005 - 12:21:09 CST

• Next message: Philippe Verdy: "Re: Roman Numerals (was Re: Improper grounds for rejection of proposal N2677)"

Regarding Roman numerals, there are still missing combining numerals to form
the large numbers, i.e. the combining C on the right, and the combining
turned C on the left. These should combine with a central I. Alternatively,
the combining C on the right could be a combining C and reversed C, added
after the central I.

The existing CD thousand numeral is in fact a ligature of a central I and
the two symbols. A better (less confusive) name should have been CID (where
the D represents the reversed C, and is not confusive because the roman
numeral D cannot immediately follow the roman numeral I except when used in
combination after a leading C), rather than CD which means 400.

If one prefers, we could avoid encoding the central I, by using the existing
CD thousand numeral for meaning 1000, and adding combining numerals after it
(or after D meaning 500) to multiply its value by 10.

So the multiplicator by 10 could be a unique combining character: it will
have the form of a half circle combining on the right if it follows the
roman D numeral base character, and the form of a surrounding full circle if
it follows the roman CD-thousand numeral base character.

For now, it is impossible to represent correctly and consistently the Roman
numbers 5,000 and 10,000 (made with a double left half-circles or double
circles), 50,000 and 100,000 (made with a triple left half-circles or triple
circles)...

The only approximate alternative is to not use the existing Roman numerals
at all, and revert to Latin letters, and then use C, I, and OPEN O (which
looks quite similar to the turned C, except that the serif on is missing on
the bottom leg, when drawn with serif fonts), or to replace the sequence
<I,TURNED C> by <D>, and possibly add joiner controls between them to
request (and may be force) their ligature.

So to represent 888,888, you have to write the following sequences with
Latin letters instead or Roman numerals (I add spaces between what should be
combining sequences to make the number easier to read, but these spaces
should not be present, and use D after I instead of a combining TURNED-C
after I):

IDDD CCIDD CCIDD CCIDD = 800,000
IDD CID CID CID = 80,000
D M M M = 8,000
D C C C = 800
L X X X = 80
V I I I = 8

This results in the compact string:
IDDDCCIDDCCIDDCCIDDIDDCIDCIDCIDDMMMDCCCLXXXVIII
which would be much easier to read if it actually used the ligatures of
combining sequences.

--------

Another thing that is missing is the representation of thousand multiples:
it can be either a combining M, stretched above the complete sequence that
it multiplies, or a combining macron that is also stretched over the
complete sequence it multiplies. (Note that there can be several multipliers
stacked above the sequence, which should be a Roman number between 1 and
999).

Using macron or double macron is very confusive. Try representing
888,888,888 with them, and you'll get something rendered like:

____________
____________ ___________
DCCCLXXXVIII DCCCLXXXVIII DCCCLXXXVIII

This notation is was invented after the first one, as it is even easier to
read, and allows writing much larger numbers in a way quite similar to the
modern thousand groups in the positional decimal system.

But to encode it more correctly, one should be able to encode directly the
thousand multiplier (I note it with ° below):
DCCCLXXXVIII°DCCCLXXXVIII°DCCCLXXXVIII
It should be rendered as a macron applied about all previous roman numerals.
Alternatively, if one wants to limit the backward string lookup for
rendering, may be we could encode instead:
DCCCLXXXVIII°°DCCCLXXXVIII°DCCCLXXXVIII
(i.e. the longest string of base characters before the diacritic would be
DCCCLXXXVIII, i.e. between 1 and 12 base characters).

Note that if we don't encode at all the thousand multiplier, then the value
of the string would be ambiguous (although it would not be ambiguous in the
example above).
For example look at: C°C°C (which represents 100,100,100): compare to CCC
which represents 300.

The only current alternative, using the existing simple macrons in Unicode,
is very hard to compose, unnecessarily lengthy and errorprone (Here I also
use ° to denote this Unicode combining macron):

D°°C°°C°°C°°L°°X°°X°°X°°V°°I°°I°°I°°D°C°C°C°L°X°X°X°V°I°I°I°DCCCLXXXVIII

(This sort of string transformation should better be performed instead by
the rendering engine, before font lookup)
Also this does not allow representing the multiplier as a stretched M above
each thousand group.

Philippe.

----- Original Message -----
From: "Michael Everson" <everson@evertype.com>
To: "Unicode Discussion" <unicode@unicode.org>
Sent: Friday, October 28, 2005 5:17 PM
Subject: Re: Roman Numerals (was Re: Improper grounds for rejection of
proposal N2677)

> At 19:00 +0400 2005-10-28, Andrew S wrote:
>>Michael Everson wrote:
>>> You should use the regular Latin letters.
>>Why?
>
> Fine. Do what you want, if you don't want to take my advice.
> --
> Michael Everson * http://www.evertype.com

This archive was generated by hypermail 2.1.5 : Fri Oct 28 2005 - 12:23:23 CST