Re: Combining Overstruck diacritics

From: Jukka K. Korpela (jkorpela@cs.tut.fi)
Date: Tue May 29 2007 - 09:16:27 CDT

  • Next message: Marnen Laibow-Koser: "Re: Combining Overstruck diacritics"

    On Tue, 29 May 2007, "Arne Gtje ()" wrote:

    > what's the Unicode policy for the Combining Overstruck diacritics,
    > especially U+0335 and U+0336?

    I don't remember having seen any policy statements on such issues, beyond
    the properties defined for the characters. Unicode primarily encodes
    characters and defines properties for them, instead of telling you which
    character to use in which situation.

    > Is it appropriate to use
    > <i><U+0336>
    > <I><U+0336>
    > <l><U+0336>
    > <L><U+0336>
    > <u><U+0336>
    > <U><U+0336>
    > in an alphabet

    If you are designing a new alphabet, it is up to you to choose the
    characters. Different choices have different implications. In particular,
    dynamic composition is still problematic (if supported at all) in many
    programs.

    > or should the precomposed ones (U+0268, U+0197, U+019A,
    > U+023D, U+0289, U+0244) be used instead?

    They are _not_ precomposed characters, and there is no defined
    relationship (within Unicode) between them and the sequences you
    mentioned. They may look similar, but thery are quite distinct.
    They should not be expected to look the same; rather the opposite,

    Unicode does not analyze and decompose letters with a stroke as containing
    a diacritic mark. Instead, they are coded as separate characters. (I've
    never seen an explanation to this, but it's certainly too late to change
    such issues, and the decision is understandable if you consider how the
    "stroke" in letters varies in shape.)

    > Same applies to the LINE BELOW (U+0331 or U+0332?)

    No, that's a different issue, because there are precomposed character with
    those characters as components.

    > Should <d><D><l><L><r><R><t><T> with line below used as combined
    > diacritics, or as precomposed codepoints?

    It depends. You need to consider the different factors. Unicode just tells
    that there is canonical equivalence and there are various
    normalization forms. On the practical side, depending on implementations
    and not on the Unicode standard, the precomposed form (when available)
    in better supported by software and results in better rendering. But there
    are many factors that might make decomposed form more feasible.

    > I'm asking, because I need to use <d><D><t><T> with <U+0301> anyways to
    > get the desired glyph...

    I guess you are referring to the practical point I mentioned. Using a
    precomposed character, you can get a a glyph designed by a font designer;
    using a combining diacritic mark, you often get an oddly placed mark.
    Theoretically, the rendering engine could map a sequence to the same glyph
    as the one used for a precomposed character, but this is not common.

    -- 
    Jukka "Yucca" Korpela, http://www.cs.tut.fi/~jkorpela/
    


    This archive was generated by hypermail 2.1.5 : Tue May 29 2007 - 09:20:18 CDT