Re: Questions on ZWNBS - for line initial holam plus alef

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Wed Aug 13 2003 - 17:07:40 EDT

  • Next message: Peter Kirk: "Re: Questions on ZWNBS - for line initial holam plus alef"

    From: "Peter Kirk" <peter.r.kirk@ntlworld.com>

    > On 13/08/2003 11:09, Philippe Verdy wrote:
    >
    > >... For this reason, defective
    > >combining sequences (combining characters without a leading base
    > >character) should be forbidden (invalid for XML).
    > >
    > >
    > If there is even the remotest possibility of this happening, we need
    to
    > know quickly! Defective combining sequences are legal Unicode and are
    > now being suggested for use in Hebrew e.g. for holam male. But such a
    > definition would be useless if XML restricts the texts it can
    represent
    > to a subset of Unicode excluding such sequences.

    I did not notice that the discussion about Hebrew holam male was
    related.
    In fact I don't know anything about the hebrew alphabet so I could not
    understand the semantics discussed, and so di not note that <holam, vav>
    was a "defective" encoding (in terms of combining sequences).

    When using the term "forbidden", it was only related to possible
    security
    problems with XML, but the term was certainly too much expeditive.
    However, given that possible security and parsing issues do exist, the
    case of <holam, vav> used to encode "holam-male" may be another
    argument to propose a neutral/invisible base character for combining
    characters. For the case of Hebrew, it then needs to have a "letter"
    behavior, but for the case of other isolated diacritics in Latin,Greek
    Cyrillic, and probably also Hiragana, Katakana (voice marks) it should
    better be handled as a symbol.

    I suggested several semantics for this invisible character(s) in a
    earlier
    message:
    - A invisible symbol
    - An invisible LTR letter
    - An invisible RTL letter
    all of them having a *compatibility* decomposition (or NFKD form) as
    a SPACE like other existing spacing combining marks, but not being
    canonical equivalent of SPACE (to keep separately the legacy semantics,
    properties, behavior and known caveats unchanged and
    implementation/usage-dependant, as they are now with SPACE+NSM
    which could then be discouraged in Unicode and strongly deprecated
    in SGML/HTML/XML)



    This archive was generated by hypermail 2.1.5 : Wed Aug 13 2003 - 18:06:27 EDT