Re: Combining marks with two letters

From: Jukka K. Korpela (jkorpela@cs.tut.fi)
Date: Mon Feb 11 2008 - 09:03:01 CST

  • Next message: Philippe Verdy: "RE: Combining marks with two letters"

    Andreas Prilop wrote:

    > The LoC romanization rules for Abkhaz(ian) Cyrillic
    > http://www.loc.gov/catdir/cpso/romanization/nonslav.pdf
    > http://www.loc.gov/catdir/cpso/roman.html
    > show a "ts" digraph with a ligature tie or arc above them
    > and a centered dot above this arc
    > How do you write this in Unicode?

    Maybe we should start from the simpler question that was recently
    discussed in news:comp.fonts , namely how to encode a "ts" digraph with
    a ligature tie, or similar other constructs used in the Library of
    Congress transliteration schemes. Two approaches were mentioned: use
    U+0361 between the letters, or use U+FE20 after the first letter and
    U+FE21 after the second one. The latter corresponds to the approach used
    in the document (see the info on character modifiers at the end of the
    document).

    Any approach appears to be problematic for practical reasons (limited
    font coverage, poor implementation in fonts, and flaws in rendering
    software). Therefore, I don't think either approach can be excluded.
    Even if one were superior on some theoretical grounds, the other one
    might produce a more reasonable visual presentation.

    Adding a centered dot above the construct appears obvious at some level:
    just use a combining dot above. I guess the first problem is then where
    to put it. In the first approach, could we put it after U+0361? Hardly?
    Where then? I don't think there's any way to encode the construct in
    Unicode as currently defined, since we can't designate the combining dot
    above as applying to two characters as a unit. There are things that
    could be done at higher protocol levels to produce the appearance, but
    that's a different issue.

    If we interpreted the "ts" as a single character, the ts ligature (in
    the IPA block), then we would just need to decide what the diacritic
    mark above it (an inverted breve?) is. But the ts ligature character
    looks different - it _is_ a ligature, whereas in the transliteration
    scheme, it's really two characters, and the "ligature tie" above them
    indicates that they belong together - it is used _instead of_ using a
    ligature glyph.

    Jukka K. Korpela ("Yucca")
    http://www.cs.tut.fi/~jkorpela/



    This archive was generated by hypermail 2.1.5 : Mon Feb 11 2008 - 09:05:11 CST