Re: Word dividers, was: proposals I wrote (and also, didn't write)

From: John Cowan (
Date: Tue Dec 07 2004 - 21:36:37 CST

  • Next message: John Cowan: "Re: Invalid UTF-8 sequences (was: Re: Nicest UTF)"

    Peter Kirk scripsit:

    > I notice that Elaine is here proposing a HEBREW SAMARITAN PUNCTUATION
    > WORD DIVIDER - and this should be in the BMP as Samaritan is a script in
    > modern list. But there is already in the pipeline a PHOENICIAN WORD
    > SEPARATOR, provisionally U+1091F, and already defined U+10101 AEGEAN
    > WORD SEPARATOR DOT, and also of course U+00B7 MIDDLE DOT. The glyphs for
    > all of these seem indistinguishable, and so are the functions. The only
    > difference seems to be the scripts they are associated with, but
    > punctuation marks are supposed to be not tied to individual scripts.

    Well, some are and some aren't. Arabic ? is definitely tied to Arabic,
    for example. As usual, Unicode is empirical rather than rational.

    In any case, MIDDLE DOT, despite its official classification as
    punctuation, requires special treatment because of its use in
    Catalan orthography as effectively a modifier letter, so it is
    not useful to unify it with anything else. (It is already
    canonically equivalent to GREEK ANO TELEIA, which is regrettable.)

    > Is there really a need for so many almost identical word divider dots?

    Probably not. We already have gobs of dots. It's one of those things:
    on the other hand, Unicode unifies all the Indic dandas, for example.

    But you, Wormtongue, you have done what you could for your true master.  Some
    reward you have earned at least.  Yet Saruman is apt to overlook his bargains.
    I should advise you to go quickly and remind him, lest he forget your faithful
    service.  --Gandalf             John Cowan <>

    This archive was generated by hypermail 2.1.5 : Tue Dec 07 2004 - 21:38:26 CST