Re: BOM as WJ?

From: Peter Kirk (peterkirk@qaya.org)
Date: Wed Nov 19 2003 - 14:02:50 EST

  • Next message: Philippe Verdy: "Re: Definitions"

    On 19/11/2003 01:49, Pim Blokland wrote:

    >In the online 4.0 book, chapter 15
    >
    >http://www.unicode.org/versions/Unicode4.0.0/ch15.pdf
    >
    >the definition for Word Joiner says:
    >
    >
    >
    >>Until Unicode 3.1.1, U+FEFF was the only code point with word
    >>joining semantics, but because it is more commonly used as
    >>byte order mark, the use of U+2060 [word joiner] to indicate
    >>word joining is strongly preferred for any new text.
    >>
    >>
    >
    >
    >
    Perhaps this depends what is meant by "word joining semantics". I would
    presume this to imply that a word boundary is not permitted at this
    point, but in fact on the current definitions in UAX29
    (http://www.unicode.org/reports/tr29/tr29-5.html) ZWNBS, WJ and NBSP are
    all treated as word boundary characters.

    >However, a couple of paragraphs up, the definition for No-Break
    >Space says:
    >
    >
    >
    >>U+00A0 [No-Break Space] behaves like the following coded
    >>character sequence: U+FEFF [Zero Width No-Break Space] +
    >>U+0020 [Space] + U+FEFF [Zero Width No-Break Space].
    >>
    >>
    >
    >Is this something that has slipped by the editors? Or am I missing
    >something?
    >
    >Pim Blokland
    >
    >
    Does this equivalence hold when combining characters are applied to the
    NBSP? Is the sequence <NBSP, CC> (recommended for spacing diacritics,
    where CC is any sequence of combining characters) equivalent to <ZWNBS,
    SP, ZWNBS, CC>? Or should the equivalence be to <ZWNBS, SP, CC, ZWNBS>?
    Is it legal to combine combining characters with ZWNBS, or WJ, and how
    should this be rendered?

    -- 
    Peter Kirk
    peter@qaya.org (personal)
    peterkirk@qaya.org (work)
    http://www.qaya.org/
    


    This archive was generated by hypermail 2.1.5 : Wed Nov 19 2003 - 15:03:40 EST