RE: Complex Combining

From: Philippe Verdy (
Date: Fri Nov 28 2003 - 21:20:49 EST

  • Next message: Christopher John Fynn: "Re: Oriya: nndda / nnta?"

    Peter Kirk writes:
    > On 28/11/2003 01:57, Andrew C. West wrote:
    > >These are all specialised cases that are strictly necessary in order to
    > >represent the respective scripts. General text formatting such
    > as underlining or
    > >arbitrary encirclement of characters (or cartouchement of
    > ideographs which is
    > >common in traditional Chinese texts) is considered to be "rich
    > text" and beyond
    > >the scope of Unicode. Whenever I read threads like this one (and
    > they resurface
    > >with monotonous regularity) I do wonder whether the participants
    > have ever read
    > >TUS Section 2.2 "Unicode Design Principles".
    > Andrew, I agree with Jill that there is no need to get ad hominem. You
    > will see that I anticipated your objection. I listed several cases where
    > a combining mark might need to be associated with a group of characters,
    > and suggested that some might be dealt with as "rich text". You have
    > confirmed what I wrote. Some of my cases have already been encoded in
    > Unicode, and in just the way I suggested; others are considered (by the
    > UTC, or just by you?) as "rich text". Like Jill, I see some possible
    > inconsistency. One point of this discussion is perhaps to determine if
    > we ought to try to make things more consistent.

    I don't think it is a matter of consistency here: the only thing that
    matters is whever the absence of such grouping for diacritics can produce
    semantically incorrect text, or text whose semantic is ambiguous. I agree
    with Peter here that we have good examples where the simple model with a
    single base character and one or more diacritics is too limited to represent
    the text correctly.

    For mathematics, we can still use parentheses to group items on which on
    operator applies, but this is just adding to the complexity of reading of a
    formula (readers must count themselves the parentheses in a plain-text file,
    simply because there's no formating, and a renderer has no way to render
    plain text without these parentheses). So there are cases where parentheses
    are not wishable, but where invisible parentheses would allow grouping
    operators correctly.

    For cartouches, the solution used in music is possible, but there are other
    cases like the need to use combining diacritics on more than one character
    (how do you note a vector by its two points? How do you surround correctly
    the hieroglyph cartouche? how do you mark in the text the coloring diacritic
    or the upper thick bar that denotes it and that HAS a textual semantic?)

    I think it would be simple to have invisible parentheses in that case, and
    be able to apply the diacritic in the group:
    <invisible open bracket><diacritics><one or more characters><invisible close
    Then renderers have several options to display it: either effectively use a
    2D layout where the diacritic is effectively drawn according to the whole
    group seen as if it was a single base character, or applying the diacritic
    only on a open bracket glyph like a dotted parenthesis glyph, or a dotted
    square containing a parenthesis glyph (this second solution would probably
    be used in 1D font-based renderers, the first one being supported only in
    some cases, or by more advanced 2D layout engines).

    In fact, within Unicode charts, the invisible open/close pairs should show
    the dotted square glyph with the parenthese for its representative glyph.
    This glyph being mirrored in BiDi environments.

    It is still consistent with the current encoding of music notations. Or with
    the current encoding of parentheses pairs: the only difference is that
    theses character have no defined width, and is preferably invisible. It is
    also consistent with the semantic notation of invisible mathematic operators
    in Unicode. I don't think it's a hack (it is much less a hack than
    double-width diacritics that have been encoded, or even halves of
    double-width diacritics which are also encoded).

    It's true that encoding them would seem to allow to encode some rich-text
    styles like underlining, but I think that giving these characters the
    properties of punctuations would discourage their usage to underline any
    sequences of character within words (better achieved by rich-text
    formatting), and that the fact that we allow the "invisible parentheses" to
    be displayed if needed with a glyph would in practive avoid using them just
    for underlining text; for example by applying a lower macron on a grouped
    text one could think about using:

    <open invisible parenthere><combining macron below>text to underline<close
    invisible parenthese.

    But this could be rendered in a compliant way either by underlining the
    surrounded text, or by underlining unly the leading dotted parenthese glyph.
    So it would not have the desired effect for underlining. However this would
    be correctly interpreted as a semantic notation.

    There already existing possible diacritics to create a cartouche: notably
    the "combining enclosing screen" character or the "combining enclosing
    keycap" or the "combining enclosing square".

    << ella for Spam Control >> has removed Spam messages and set aside
    Newsletters for me
    You can use it too - and it's FREE!

    This archive was generated by hypermail 2.1.5 : Fri Nov 28 2003 - 22:04:08 EST