Re: How to encode underlined characters

From: Doug Ewell (
Date: Mon Sep 12 2005 - 00:26:47 CDT

  • Next message: Chris Harvey: "Re: How to encode underlined characters"

    Chris Harvey <chris at languagegeek dot com> wrote:

    > To revisit Carrier. They have underlined s z ts dz. Ẕ has a
    > precomposed character (U+1E94) which is equivaltent to Z plus U+0331
    > (COMBINING MACRON BELOW). S̱ does not have a precomposed character,
    > so one would use S plus U+0331 to be consistant with ẕ. It would be
    > unfortunate to encode the underlined TS and DZ with U+0332. If that
    > were to happen, ẕ and s̱ alone would use U+0331, but ẕ and s̱ as
    > part of a digraph would switch to U+0332.

    Is it necessary to choose diacritics that are reflected in existing
    precomposed characters? In other words, are you recommending COMBINING
    MACRON BELOW for Carrier instead of COMBINING LOW LINE (a) because
    existing material printed in the Latin orthography for Carrier genuinely
    appears to show a macron-below rather than a low line, or (b) because
    Unicode already has precomposed characters for some of the macron-below

    UTN #19 does suggest that when a brand-new orthography is being created,
    "it is best to try to select a combination from commonly available
    [precomposed] characters" as opposed to combinations of base + combining
    mark. But the Carrier and Shoshon{i,e} orthographies already exist, and
    the only question seems to be how to encode them. They should be
    encoded as what they are.

    > different.

    Stipulated; a low line is definitely wider than a macron-below.

    Can you scan actual examples of Carrier and Shoshon{i,e} printed text,
    so we can see which is preferred for each language, before encoding them

    > Shoshoni however, seems to require the COMBINING LOW LINE as there can
    > be up to four letters with one long underline. Fortunately, there are
    > no precomposed underlined A’s or I’s to cause confusion.
    > I agree that it’s awkward for language x to use the MACRON BELOW while
    > language y uses LOW LINE, but it seems that Carrier may have to use
    > the former, while Shoshoni, the latter.

    I suppose I should leave this for others with more experience to decide,
    but it does seem that the distinction will seem arbitrary and will cause
    honest confusion. Perhaps I am wrong.

    Doug Ewell
    Fullerton, California

    This archive was generated by hypermail 2.1.5 : Mon Sep 12 2005 - 00:28:32 CDT