Re: U+0140

From: Kenneth Whistler (kenw@sybase.com)
Date: Thu Apr 15 2004 - 14:47:56 EDT

  • Next message: Kenneth Whistler: "Re: U+0140"

    > Did you get an answer on this ? Why is there no decomposition associated
    > to this character ?

    Thanks to Eric and Patrick for digging out my answer on this perennial
    question from a couple years back, and saving me the trouble of
    having to rummage around to find it. :-)

    Also, it should be noted that there *is* a decomposition for
    U+0140 in the Unicode Character Database, to wit:

    0140;LATIN SMALL LETTER L WITH MIDDLE DOT;Ll;0;L;<compat> 006C 00B7;...
                                                     ^^^^^^^^^^^^^^^^^^
                                                     
    It is a compatibility decomposition for two reasons: the decomposition
    into the sequence <006C, 00B7> may result in rendering differences
    (both because of potentially different decisions about where the
    render the dot and because the introduction of the U+00B7 MIDDLE DOT
    might impact line break decisions, depending on the implementation);
    secondly, the properties of the characters in the sequence
    <006C, 00B7> are distinct from those for <0140> by itself, and
    may impact things such as identifier parsing, again, depending on
    an implementation. And, as I indicated before, U+0140 is itself
    basically a compatibility character, introduced for mapping to
    ISO 6937, a preexisting standard that was among the list of
    character encoding standards intended to be covered by the initial
    Unicode repertoire.

    The character *was* in ISO 6937 for Catalan. Noting the Catalan
    association in the Unicode names list is different from any
    recommendation that U+0140 is the preferred character for the
    representation of l followed by a middle dot in Catalan text.
    Most existing Catalan data (8859-1, Windows 1252, primarily)
    would not use it, of course. Converted to Unicode, that data would
    also not use it, but be represented as the sequence <006C, 00B7>.
    And there is every expectation that new data created in Unicode
    would continue to use such a sequence for Catalan.

    --Ken



    This archive was generated by hypermail 2.1.5 : Thu Apr 15 2004 - 15:31:24 EDT