From: Philippe Verdy (
Date: Fri Apr 15 2005 - 11:56:39 CST

  • Next message: Philippe Verdy: "Re: String name and Character Name"

    From: <>
    >> The problem is that I am not sure that this is a normal acute accent. May
    > be
    >> this is a double-wide acute accent (sorry for the name but there's also a
    >> "double acute" accent, where double means "repeated twice side-by-side")
    >> which may be encoded separately, with the combining class 234, and for
    > which
    >> no CGJ would be needed (additionaly, it would be possible to put this
    > accent
    >> above two letters without the double-wide inverted breve.
    > However, such a thing (double-wide acute accent) does not exist in
    > Unicode,
    > does it?

    No it doesn't. I never said it existed, because my sentence clearly says it
    would need to be encoded separately with the combining class 134 used by
    other "double-wide" accents.

    Sorry, but I really don't like the term "double" applied to diacritics that
    cover two sub-graphemes. My opinion is that they should have not been
    encoded, but rather encoded using the standard diacritics above a zero-width
    linking base character similar to ZWJ, used to combine several grapheme
    clusters into a single default grapheme cluster, something that could have
    been named "grapheme joiner", like this for example:

    - to create a combined grapheme of letters a and e, without ligaturing them,
        <a>, <GJ>, <e>
    - one can create longer combined graphemes if needed, for example to place a
    inverted breve above all of them:
        <a>, <GJ>, <y>, <GJ, combining breve above>, <e>
    which creates a combined grapheme for the three letters <a,y,e> and places a
    linking mark ("inverted breve") above all of them.

    So to encode the example given previously, we would have coded:
        <a>, <GJ, combining inverted breve above, combining acute accent>, <e>
    because the normal combining accents share the same combining class 230 and
    their relative order is preserved by normalization.
    (In this example, there are 3 "combining sequences": 2 for the base letters,
    1 for the complex diacritics, but they are creating a single default
    grapheme cluster)

    The other solution would have been to create separate invisible open/close 
    base joining characters, so that several encapsulation levels of graphemes 
    would have been created; these would have acted like "meta" punctations 
    (similar to parentheses, except that they don't break the words within which 
    they may be inserted, and so that these meta-notations can be esily filtered 
    out by processes that want to ignore the diacritics applied to these 
    This would have been useful to embed notations like those used in grammar 
    books for children. This would have worked also like "interlinear 
    annotations" (or "ruby layout" in Asian texts), by specifying explicitly in 
    the plain-text to which sets of encoded graphemes the annotations or 
    diacritics apply.
    Renderers that are unable to render those annotations or diacritics on more 
    than a single-grapheme as it requires a 3D capable layout engine, could have 
    been allowed to not render the annotation, or to use another way to link 
    those annotations in the final rendered document (for example, the "ruby 
    layout" can be substituted by a linking anchor and a note rendered in a 
    separate paragraph, possibly with a smaller font and some indentation, or in 
    the page footer.)

    This archive was generated by hypermail 2.1.5 : Fri Apr 15 2005 - 11:57:39 CST