RE: Combining sequences (was: Unicode Public Review Issues update)

From: Kent Karlsson (kentk@cs.chalmers.se)
Date: Thu Jul 03 2003 - 09:50:05 EDT

  • Next message: Philippe Verdy: "Re: Yerushala(y)im - or Biblical Hebrew"

    (Disregarding your netiquette breach of quoting an off-list message on
    a list...)

    > Unicode characters can be said "deprecated", or strongly discouraged
    > howeer they are still valid, and then it's best to describe
    > what should be
    > their correct behavior. My question was there only for completeness,
    > something that the Public Review Issues is supposed to enhance and
    > document officially, even for "deprecated" characters.

    There is no point in doing undue work for characters that shouldn't be
    used
    (whether deprecated or not).

    > > The (typographic) dot(s) above should be removed if there is a
    > > combining character of class 230 [centred above] in a combining
    > > sequence starting with a soft-dotted character. The file
    > > UCD-4.0.0.html only says "An accent placed on these characters...";
    > > but the "on" here should be interpreted as "class 230". That could
    > > be clarified.
    >
    > Thanks for admitting that the current description may easily
    > be misread
    > as meaning "any diacritic".

    Well, it was not my formulation. I've always referred to "(current)
    combining class 230". I might not have written it explicitly every time
    though.

    > With such misreading, a simple
    > font renderer
    > may just check the presence of the first diacritic to use a
    > dotless glyph,

    Not likely, though. I do think typographers are smart too. ;-)

    > I would like to have exact comments of what "on" means: does it *only*
    > refer to the class 230? What is the impact of format controls inserted
    > in a combining sequence,

    They break the combining sequence. Applying a combining character
    to a format control is, while legal, not something that has a
    well-defined
    behaviour.

    ...
    > - Hangul syllables are very well defined

    Hangul is a problem case. But I will not go into that here and now.

                    /kent k



    This archive was generated by hypermail 2.1.5 : Thu Jul 03 2003 - 10:58:03 EDT