Re: Are these characters encoded?

From: DougEwell2@cs.com
Date: Sat Dec 01 2001 - 16:02:44 EST

Previous message: John Hudson: "Re: Are these characters encoded?"
Maybe in reply to: Stefan Persson: "Are these characters encoded?"
Next in thread: Michael Everson: "Re: Are these characters encoded?"
Reply: Michael Everson: "Re: Are these characters encoded?"
Reply: G. Adam Stanislav: "Re: Are these characters encoded?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

At 2001-12-01 11:24:04 Pacific Standard Time,
alsjebegrijptwatikbedoel@yahoo.se (Stefan Persson) wrote:

> I was thinking if this was encoded:
>
> 1.) Swedish ampersand (see "&.bmp"). It's an "o" (for "och", i.e. "and")
> with a line below. In handwritten text it is almost always used instead of
> &, in machine-written text I don't think I've ever seen it.

This might be a character in its own right, as different from the ampersand
as U+204A TIRONIAN SIGN ET. Or it might be simply a glyph variant of the
ampersand. If you have never seen o-underbar in machine-written text, I
doubt that this will help your cause much. You might try U+006F U+0332,
though this will probably not give you the vertical spacing you expect.

(As a side note, this "o-underbar" form reminds me of the "c-underbar" which
is sometimes used in handwritten English to mean "with." Does anyone know
the origin of this symbol? Is it possibly derived from the Latin word cum,
meaning "with"? Does it have any claim to being a character in its own
right?)

> 2.) Fractions with any number, see "brĺk.bmp."

U+2044 FRACTION SLASH is exactly what you are looking for. Whether your
browser or other rendering engine will display it the way you want is another
matter.

On page 154 of TUS 3.0, there is a two-paragraph description of the use of
U+2044. Note particularly the sentence:

"The standard form of a fraction built using the fraction slash is defined as
follows: Any sequence of one or more decimal digits, followed by the fraction
slash, followed by any sequence of one or more decimal digits."

This would give you the results you expect for "123/456" but not for "x/y" or
even "14658.48/13789". However, it is not clear to me that this "standard
form" is normative, and it is conceivable that a fraction-slash-aware
renderer could generalize this to "one or more non-space characters, fraction
slash, one or more non-space characters."

> 3.) Roman numerals. I know I-XII are encoded, but what if you want to use
> higher numbers? Typing "XX," you might suggest.

The set of Roman numerals, at least through 4999, can be completely specified
with the characters U+2160 "I", U+2164 "V", U+2169 "X", U+216C "L", U+216D
"C", U+216E "D", and U+216F "M" (or, of course, with the equivalent Latin
letters). According to TUS 3.0, page 299, "Upper- and lowercase variants of
the Roman numerals through 12, plus L, C, D, and M, have been encoded for
compatibility with East Asian standards." Requests for additional
precomposed Roman numerals will almost certainly be denied.

> This is not always
> sufficient; in Sweden we often put a line under and one above the numbers,
> see "Roma.bmp."

Sounds like a glyph-variant issue. Font designers might want to ensure that
the glyphs for the Roman numeral forms do have the over- and underlines.
Then, if a user doesn't want them, she can always use the plain Latin letters
instead.

> And what about ten thousands? Neither "XŻ" nor "XŻ" are
> displayed properly!

They should be; that's what the combining characters are there for. (Hint:
you want U+0305 COMBINING OVERLINE, not U+0304 COMBINING MACRON.)

To be fair to Stefan, most rendering engines have a long way to go to catch
up with the Unicode ideal of being able to attach arbitrary combining marks
(like U+0305) to arbitrary base characters (like U+2169). Many renderers
simply replace the sequence with a precomposed glyph. This approach looks
really sharp IF such a glyph is available, but breaks down otherwise.

-Doug Ewell
Fullerton, California

Previous message: John Hudson: "Re: Are these characters encoded?"
Maybe in reply to: Stefan Persson: "Are these characters encoded?"
Next in thread: Michael Everson: "Re: Are these characters encoded?"
Reply: Michael Everson: "Re: Are these characters encoded?"
Reply: G. Adam Stanislav: "Re: Are these characters encoded?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.2 : Sat Dec 01 2001 - 16:56:49 EST