From: Jukka K. Korpela (jkorpela@cs.tut.fi)
Date: Thu Sep 08 2005 - 11:10:15 CDT
On Thu, 8 Sep 2005, Chris Harvey wrote:
> Many North American Native languages use underlined letters as part of their
> orthographies.
That's new to me. In general, the answer to the question how to encode
underlined characters is that they should normally be encoded so that
underlining is expressed in markup or using a word processor's underlining
feature. But if there is a semantic difference between a letter and an
underlined letter, or if just wish to make the difference in plain text,
then the options are
- letter followed by combining macron below
- letter followed by combinig low line (which connects on left and right)
I'd expect the choice depend on what is common practice for the characters
in printed matter.
> In a situation where two characters make up one orthographic letter, which is
> underlined, one would use U+035F COMBINING DOUBLE MACRON BELOW.
> Thus ai (underlined ai) would be U+0061 U+035F U+0069
Perhaps. This depends on the interpretation of the underline as macron vs.
low line. If it is low line, you would need to use U+0061 U+0332 U+0069
U+0332.
> But what about situations where three or more characters make up one
> orthographic letter which is underlined, such as aai or aaii? The underline
> should be one long line, not three or four individual MACRON BELOWs.
It would sound unnatural to use a double diacritic followed by a single
diacritic (or vice versa), and there is no triple diacritic we could use.
> I can think of a few options.
> a) aai (all underlined) could have two COMBINING DOUBLE MACRON BELOWs: U+0061
> U+035F U+0061 U+035F U+0069
Sounds illogical, even though it might work on a program that implements
combining double diacritics well.
> b) aai (all underlined) could use three COMBINING LOW LINES (U+0332): U+0061
> U+0332 U+0061 U+0332 U+0069 U+0332
That sounds both logical and practical. The Unicode standard explicitly
refers to such possibilities, in Chapter 7:
"Underlining and Overlining. The characters U+0332 combining low
line, U+0333 combining double low line, U+0305 combining overline,
and U+033F combining double overline are intended to connect on the left
and right. Thus, in combination, they could have the effect of continuous
lines above or below a sequence of characters. However, because of their
interaction with other combining marks and other layout considerations,
such as intercharacter spacing, their use for underlining or overlining of
text is discouraged in favor of using styled text."
-- Jukka "Yucca" Korpela, http://www.cs.tut.fi/~jkorpela/
This archive was generated by hypermail 2.1.5 : Thu Sep 08 2005 - 11:10:58 CDT