Word dividers, was: proposals I wrote (and also, didn't write)

From: Peter Kirk (peterkirk@qaya.org)
Date: Tue Dec 07 2004 - 17:47:25 CST

  • Next message: Kenneth Whistler: "RE: Invalid UTF-8 sequences (was: Re: Nicest UTF)"

    On 06/12/2004 22:41, E. Keown wrote:

    >Proposal to add Samaritan Pointing to the UCS
    >WG2 number: N2748
    I notice that Elaine is here proposing a HEBREW SAMARITAN PUNCTUATION
    WORD DIVIDER - and this should be in the BMP as Samaritan is a script in
    modern list. But there is already in the pipeline a PHOENICIAN WORD
    SEPARATOR, provisionally U+1091F, and already defined U+10101 AEGEAN
    WORD SEPARATOR DOT, and also of course U+00B7 MIDDLE DOT. The glyphs for
    all of these seem indistinguishable, and so are the functions. The only
    difference seems to be the scripts they are associated with, but
    punctuation marks are supposed to be not tied to individual scripts.

    Is there really a need for so many almost identical word divider dots?
    Can't they be unified? Is there a good reason not to use U+00B7 for all
    of these? There might be a need to tailor word breaks and line break
    opportunities. Directionality should not be a problem as it should come
    from the context. But is there really a sufficiently strong argument for
    this multiplication of dots?

    Peter Kirk
    peter@qaya.org (personal)
    peterkirk@qaya.org (work)

    This archive was generated by hypermail 2.1.5 : Tue Dec 07 2004 - 17:58:06 CST