Re: How to encode underlined characters

From: Jukka K. Korpela (jkorpela@cs.tut.fi)
Date: Thu Sep 08 2005 - 11:10:15 CDT

  • Next message: Michael Everson: "Re: How to encode underlined characters"

    On Thu, 8 Sep 2005, Chris Harvey wrote:

    > Many North American Native languages use underlined letters as part of their
    > orthographies.

    That's new to me. In general, the answer to the question how to encode
    underlined characters is that they should normally be encoded so that
    underlining is expressed in markup or using a word processor's underlining
    feature. But if there is a semantic difference between a letter and an
    underlined letter, or if just wish to make the difference in plain text,
    then the options are
    - letter followed by combining macron below
    - letter followed by combinig low line (which connects on left and right)

    I'd expect the choice depend on what is common practice for the characters
    in printed matter.

    > In a situation where two characters make up one orthographic letter, which is
    > underlined, one would use U+035F COMBINING DOUBLE MACRON BELOW.
    > Thus ai (underlined ai) would be U+0061 U+035F U+0069

    Perhaps. This depends on the interpretation of the underline as macron vs.
    low line. If it is low line, you would need to use U+0061 U+0332 U+0069
    U+0332.

    > But what about situations where three or more characters make up one
    > orthographic letter which is underlined, such as aai or aaii? The underline
    > should be one long line, not three or four individual MACRON BELOWs.

    It would sound unnatural to use a double diacritic followed by a single
    diacritic (or vice versa), and there is no triple diacritic we could use.

    > I can think of a few options.
    > a) aai (all underlined) could have two COMBINING DOUBLE MACRON BELOWs: U+0061
    > U+035F U+0061 U+035F U+0069

    Sounds illogical, even though it might work on a program that implements
    combining double diacritics well.

    > b) aai (all underlined) could use three COMBINING LOW LINES (U+0332): U+0061
    > U+0332 U+0061 U+0332 U+0069 U+0332

    That sounds both logical and practical. The Unicode standard explicitly
    refers to such possibilities, in Chapter 7:

    "Underlining and Overlining. The characters U+0332 combining low
    line, U+0333 combining double low line, U+0305 combining overline,
    and U+033F combining double overline are intended to connect on the left
    and right. Thus, in combination, they could have the effect of continuous
    lines above or below a sequence of characters. However, because of their
    interaction with other combining marks and other layout considerations,
    such as intercharacter spacing, their use for underlining or overlining of
    text is discouraged in favor of using styled text."

    -- 
    Jukka "Yucca" Korpela, http://www.cs.tut.fi/~jkorpela/
    


    This archive was generated by hypermail 2.1.5 : Thu Sep 08 2005 - 11:10:58 CDT