Re: Questions on ZWNBS - for line initial holam plus alef

From: Philippe Verdy (
Date: Sat Aug 09 2003 - 18:31:19 EDT

  • Next message: Philippe Verdy: "Re: Display of Isolated Nonspacing Marks (was Re: Questions on ZWNBS...)"

    On Saturday, August 09, 2003 11:14 PM, Peter Kirk <> wrote:

    > On 09/08/2003 13:41, John Cowan wrote:
    > > Peter Kirk scripsit:
    > >
    > >
    > >
    > > > The gap may not be large, but Philippe, John H and I have
    > > > identified a real gap. Why this antagonism against filling it?
    > > >
    > > >
    > >
    > > What you have identified is a set of implementation defects, not
    > > problems with the Unicode Standard. The standard way to do what
    > > you want is to precede the combining mark with SP or NBSP. If that
    > > "doesn't work", then the implementation that makes it not work
    > > needs to be fixed.
    > >
    > >
    > >
    > Tell Microsoft! (See Noah Levitt's posting.)

    And the W3C or SGML commities with the *ML character model!

    > If this is indeed "The standard way to do what you want", then the
    > standard needs to make it clear that the sequence of <space, combining
    > mark> or <NBSP, combining mark> has the properties which I want, i.e.
    > it has the width of the combining mark alone, and not the full width
    > of a space, and does not expand for justification, is not a line
    > breaking opportunity, does not in fact have any of the properties of
    > a space. I expect to see such a clarification in the next edition of
    > the Unicode Standard.

    Don't forget the issues created by the fact that in many cases, there's
    no other way than using "defective" sequences, hoping that the
    implementation will render the diacritic alone and not the dotted circle,
    and will correctly space the diacritic. For now the tricky solution using
    any (unspecified) control character before the diacritic is really
    a trick, and not interoperable, and it complexifies the plain-text search
    application where there is no predictable or stable base character to
    match this diacritic (in addition, many input methods or keyboard driver
    will not allow you to enter such "defective" sequence, meaning that for
    example the "Yerushala(y)im" word cannot be entered and searched
    exactly within a large text, as the implied invisible letter has no stable

    Note that the CGJ solution will not work when the isolated diacritic must
    be the initial of a word or breakable token: for this case, the solution with
    SPACE is really tricky due to the special treatment of SPACE notably
    in HTML, SGML, XML and often SQL which "normalize" whitespaces.

    Thanks, the existing spacing diacritics do not have these problems as
    they are not canonically equivalent to the suggested SPACE+diacritic
    "compatibility equivalent", however this is only part of a solution for
    some diacritics (not ALL), and it only fills the use as symbols, but not
    as regular letters within the same word with surrounding letters.

    So there is really two gaps: a small gap for missing spacing diacritics
    used as symbols, and a large gap for all isolated diacritics used within
    a word (that the CGJ solution only solves in the middle or at end of a
    word, but not at its initial).

    Spams non tolérés: tout message non sollicité sera
    rapporté à vos fournisseurs de services Internet.

    This archive was generated by hypermail 2.1.5 : Sat Aug 09 2003 - 19:02:56 EDT