Re: Combining marks with two letters

From: André Szabolcs Szelp (a.sz.szelp@gmx.net)
Date: Thu Feb 14 2008 - 08:55:59 CST

  • Next message: Philippe Verdy: "RE: FWD: 2008 UN Year of Languages. 2008-02-28 International Mother Language Day"

    Hello, Peter,

    "Adjusting kerning" does not do it, as the acute accent may not only be applied to cs grapheme as a whole, but actually both to c and s separately (which you could not represent if you tweaked the font to accomodate the acute over the digraph) in that phonetic transcription system, as all the three sounds represented by c, s and cs might be palatalised (similarly both single components of the digraphs sz, zs stand for other sounds palatalisable).

    I'll think about writing a proposal (and try to find the time?); however, I'd like to discuss the issue here first.

    Do you support the "INVISIBLE DOUBLE DIACRITIC MARK" (IDDM)? Encoding such a glyph would have the benefit of "compatibility" (but for a minor addition in the documentation) with the Abkhazian (was it?) case of double_tie CGJ dot.

    e.g.: LETTER1 IDDM CGJ SIMPLEDIACRITIC LETTER2

    Or creating a "CGJ2" which binds stronger than the CGJ currently implemented, similarly to how CGJ behaved previously with enclosing marks? (someone explained, LETTER1 CGJ LETTER2 DIACRITIC would create a grapheme consisting of LETTER1 and (LETTER2+DIACRITIC) according to the current standard).

    This second solution might make sense in enabling a _semantic_ encoding.
    As: what I did not mention is, that there is one trigraph in that writing scheme, that can take the same mark. Here a completely correct visual representation can be acchieved by D Z COMBININGACUTE S.

    A D CGJ2 Z CGJ S COMBININGACUTE would be more semantic, and semantic encoding (as opposed to visual encoding) _is_ one of the issues of Unicode, that's why we encode Indic text in phoneme order and reorder the glyphs with "complex rendering engines", after all.

    Then again, the IDDM could be seen as such a semantic sign in the first case and applied as
    D IDDM Z ACUTEACCENT IDDM S for the trigraph as well?

    Also, I have read several proposals about adding simple glyphs. This however would need some additional documentation in the standard about rendering etc. behaviour. Also on unicode.org the proposal section only talk about new characters. Could anyone point me to such proposals, where a special behaviour is assigned to it?

    Thanks,
     /Szabolcs

    -- Peter Constable wrote: --
    > However, there is at least one orthography which places a "simple"
    > diacritic centered on a digraph (*without* an additional double-width
    > diacritic)...

    Sometime last year, UTC considered a similar case involving an orthography for a language in (IIRC) Panama, and the decision was that this should be handled as a contextual kerning adjustment -- no additional characters needed to be encoded. Of course, that implies specially-designed fonts.

    If that is not adequate for the case you've mentioned, you might submit a proposal for some other solution.

    Peter

    -- 
    GMX FreeMail: 1 GB Postfach, 5 E-Mail-Adressen, 10 Free SMS.
    Alle Infos und kostenlose Anmeldung: http://www.gmx.net/de/go/freemail
    


    This archive was generated by hypermail 2.1.5 : Thu Feb 14 2008 - 09:01:18 CST