Re: U+0140

From: Kenneth Whistler (kenw@sybase.com)
Date: Thu Apr 15 2004 - 15:32:22 EDT

  • Next message: Tom Emerson: "U+066B and U+066C"

    Philippe opined:

    > If there's something really missing for Catalan, it's a middle-dot letter with
    > general category "Lo", and combining class 0 (i.e. NOT combining).

    The one thing for sure is that the Unicode Standard does not need
    to encode more middle dots:

    00B7;MIDDLE DOT;Po;0;ON;;;;;N;;;;;
    0701;SYRIAC SUPRALINEAR FULL STOP;Po;0;AL;;;;;N;;;;;
    1427;CANADIAN SYLLABICS FINAL MIDDLE DOT;Lo;0;L;;;;;N;;;;;
    22C5;DOT OPERATOR;Sm;0;ON;;;;;N;;;;;
    2F02;KANGXI RADICAL DOT;So;0;ON;<compat> 4E36;;;;N;;;;;
    302E;HANGUL SINGLE DOT TONE MARK;Mn;224;NSM;;;;;N;;;;;
    30FB;KATAKANA MIDDLE DOT;Pc;0;ON;;;;;N;;;;;
    FE45;SESAME DOT;Po;0;ON;;;;;N;;;;;
    FF65;HALFWIDTH KATAKANA MIDDLE DOT;Pc;0;ON;<narrow> 30FB;;;;N;;;;;
    10101;AEGEAN WORD SEPARATOR DOT;Po;0;ON;;;;;N;;;;;
    1D16D;MUSICAL SYMBOL COMBINING AUGMENTATION DOT;Mc;226;L;;;;;N;;;;;
    2027;HYPHENATION POINT;Po;0;ON;;;;;N;;;;;
    16EB;RUNIC SINGLE PUNCTUATION;Po;0;L;;;;;N;;;;;
    1802;MONGOLIAN COMMA;Po;0;ON;;;;;N;;;;;
    318D;HANGUL LETTER ARAEA;Lo;0;L;<compat> 119E;;;;N;HANGUL LETTER ALAE A;;;;
    1D01B;BYZANTINE MUSICAL SYMBOL KENTIMA ARCHAION;So;0;L;;;;;N;;;;;

    (and that's not considering the lowered dots "FULL STOP" and the raised
    dots)

    > It's
    > unfortunate that almost all legacy Catalan text transcoded to
    > Unicode are based
    > on the middle-dot symbol (the one mapped in ISO-8859-1 and ISO-8859-15)
    > which is
    > not seen by Unicode as a letter (Lo) but as a symbol only.

    Actually, that is *fortunate*, not unfortunate, since it is the
    correct conversion from 8859-1 (and Windows 1252) data.

    How U+00B7 behaves in Catalan data is then a matter of local
    *adaptation* of software for the correct handling of the Catalan
    language.

    Note that while the particular combination <006C, 00B7, 006C> is
    a peculiarity of Catalan orthography, U+00B7 MIDDLE DOT (often
    called a 'raised period') is
    very widely used, indeed, in technical orthographies for many
    languages, particularly in the Americas, where it is used much
    more commonly than the IPA characters U+02D0 MODIFIER LETTER
    TRIANGULAR COLON or U+02D1 MODIFIER LETTER HALF TRIANGULAR COLON
    to indicate vocalic (or less commonly, consonantal) length.

    Obsessing about the behavior of U+00B7 in Catalan data while
    ignoring its use as a vowel length indicator in many, many
    other orthographies is rather pointless, it seems to me.

    --Ken



    This archive was generated by hypermail 2.1.5 : Thu Apr 15 2004 - 16:09:48 EDT