Re: Questions on ZWNBS - for line initial holam plus alef

From: Philippe Verdy (
Date: Fri Aug 08 2003 - 19:11:15 EDT

  • Next message: Philippe Verdy: "Re: Conflicting principles"

    On Friday, August 08, 2003 9:54 PM, Peter Kirk <> wrote:

    > On 08/08/2003 08:54, Philippe Verdy wrote:
    > But I'm not sure that ZERO WIDTH SYMBOL is the best name, unless you
    > are suggesting other uses in which it really has zero width. Well, it
    > might have in a case like line initial holam which shifts on to a
    > following silent alef, but that is a rather special case.

    I just picked "SYMBOL" to just match the required property that would match
    other spacing variants of diacritics. The "ZERO WIDTH" is probably confusive, but it just marks the fact that it has no associated glyph and a null *minimum* width (which expands to the largest diacritic(s) with which it is combined).

    Its main role would be to fill the gap for missing spacing versions of existing diacritics.

    What about the name "INVISIBLE CARRIER SYMBOL" ? (note that I avoid any occurence of the term "COMBINING" in the name, because there would be no requirement for this character to be followed by any diacritic(s), but the character would itself be handled as a symbol, in a way similar to the existing spacing diacritics (that are already of category Sk, and are conceptually a combination of the INVISIBLE CARRIER SYMBOL and diacritics, defined for compatibility purpose as an approximation of the sequence SPACE+diacritic).

    It is worth noting that for now it is quite tricky to get an isolated diacritic without getting deceptive results (in some cases, the only way to do it is by using what Unicode describes as "defective" combining sequences, not illegal by themselves but whose rendering and interpretation is not guaranteed.

    On the opposite, Unicode offers a standard way to force the appearance of the dotted circle for an isolated diacritic, a function that may not always be desirable, using a dotted circle symbol as the base character.

    As someone corrected me in this list, SPACE+combiningdiacritic is admitted in the standard, but only as a compatibility equivalence for spacing diacritics, where in fact the isolated spacing diacritic is really a symbol (gc=Sk), unlike the base SPACE character used in the compatibility decomposition (which has gc=Zs), meaning that SPACE+combining diacritic does not have the same textual semantics as the effectively already encoded spacing diacritics (all of them seem to have property gc=Sk, and are not considered as Letters with gc=Lo, and that's why I thought the name "SYMBOL" was accurate).

    Also I tried to justify a possible codepoint assignment at U+20CF, where it would group more logically, given that the U+02XX block is already full and U+20XX is used for both symbols (including currencies) and a set of additional combining diacritics. Of course the U+20CF is just a suggestion, not something approved or documented.

    Spams non tolérés: tout message non sollicité sera
    rapporté à vos fournisseurs de services Internet.

    This archive was generated by hypermail 2.1.5 : Fri Aug 08 2003 - 19:41:04 EDT