Re: Are these characters encoded?

From: DougEwell2@cs.com
Date: Sat Dec 01 2001 - 16:02:44 EST


At 2001-12-01 11:24:04 Pacific Standard Time,
alsjebegrijptwatikbedoel@yahoo.se (Stefan Persson) wrote:

> I was thinking if this was encoded:
>
> 1.) Swedish ampersand (see "&.bmp"). It's an "o" (for "och", i.e. "and")
> with a line below. In handwritten text it is almost always used instead of
> &, in machine-written text I don't think I've ever seen it.

This might be a character in its own right, as different from the ampersand
as U+204A TIRONIAN SIGN ET. Or it might be simply a glyph variant of the
ampersand. If you have never seen o-underbar in machine-written text, I
doubt that this will help your cause much. You might try U+006F U+0332,
though this will probably not give you the vertical spacing you expect.

(As a side note, this "o-underbar" form reminds me of the "c-underbar" which
is sometimes used in handwritten English to mean "with." Does anyone know
the origin of this symbol? Is it possibly derived from the Latin word cum,
meaning "with"? Does it have any claim to being a character in its own
right?)

> 2.) Fractions with any number, see "bråk.bmp."

U+2044 FRACTION SLASH is exactly what you are looking for. Whether your
browser or other rendering engine will display it the way you want is another
matter.

On page 154 of TUS 3.0, there is a two-paragraph description of the use of
U+2044. Note particularly the sentence:

"The standard form of a fraction built using the fraction slash is defined as
follows: Any sequence of one or more decimal digits, followed by the fraction
slash, followed by any sequence of one or more decimal digits."

This would give you the results you expect for "123/456" but not for "x/y" or
even "14658.48/13789". However, it is not clear to me that this "standard
form" is normative, and it is conceivable that a fraction-slash-aware
renderer could generalize this to "one or more non-space characters, fraction
slash, one or more non-space characters."

> 3.) Roman numerals. I know I-XII are encoded, but what if you want to use
> higher numbers? Typing "XX," you might suggest.

The set of Roman numerals, at least through 4999, can be completely specified
with the characters U+2160 "I", U+2164 "V", U+2169 "X", U+216C "L", U+216D
"C", U+216E "D", and U+216F "M" (or, of course, with the equivalent Latin
letters). According to TUS 3.0, page 299, "Upper- and lowercase variants of
the Roman numerals through 12, plus L, C, D, and M, have been encoded for
compatibility with East Asian standards." Requests for additional
precomposed Roman numerals will almost certainly be denied.

> This is not always
> sufficient; in Sweden we often put a line under and one above the numbers,
> see "Roma.bmp."

Sounds like a glyph-variant issue. Font designers might want to ensure that
the glyphs for the Roman numeral forms do have the over- and underlines.
Then, if a user doesn't want them, she can always use the plain Latin letters
instead.

> And what about ten thousands? Neither "X¯" nor "X¯" are
> displayed properly!

They should be; that's what the combining characters are there for. (Hint:
you want U+0305 COMBINING OVERLINE, not U+0304 COMBINING MACRON.)

To be fair to Stefan, most rendering engines have a long way to go to catch
up with the Unicode ideal of being able to attach arbitrary combining marks
(like U+0305) to arbitrary base characters (like U+2169). Many renderers
simply replace the sequence with a precomposed glyph. This approach looks
really sharp IF such a glyph is available, but breaks down otherwise.

-Doug Ewell
 Fullerton, California



This archive was generated by hypermail 2.1.2 : Sat Dec 01 2001 - 16:56:49 EST