Re: U+0140

From: Kenneth Whistler (kenw@sybase.com)
Date: Thu Apr 15 2004 - 14:47:56 EDT

Next message: Kenneth Whistler: "Re: U+0140"

Previous message: Patrick Andries: "Re: U+0140"
Maybe in reply to: Patrick Andries: "Re: U+0140"
Next in thread: Patrick Andries: "Re: U+0140"
Reply: Patrick Andries: "Re: U+0140"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

> Did you get an answer on this ? Why is there no decomposition associated
> to this character ?

Thanks to Eric and Patrick for digging out my answer on this perennial
question from a couple years back, and saving me the trouble of
having to rummage around to find it. :-)

Also, it should be noted that there *is* a decomposition for
U+0140 in the Unicode Character Database, to wit:

0140;LATIN SMALL LETTER L WITH MIDDLE DOT;Ll;0;L;<compat> 006C 00B7;...
^^^^^^^^^^^^^^^^^^

It is a compatibility decomposition for two reasons: the decomposition
into the sequence <006C, 00B7> may result in rendering differences
(both because of potentially different decisions about where the
render the dot and because the introduction of the U+00B7 MIDDLE DOT
might impact line break decisions, depending on the implementation);
secondly, the properties of the characters in the sequence
<006C, 00B7> are distinct from those for <0140> by itself, and
may impact things such as identifier parsing, again, depending on
an implementation. And, as I indicated before, U+0140 is itself
basically a compatibility character, introduced for mapping to
ISO 6937, a preexisting standard that was among the list of
character encoding standards intended to be covered by the initial
Unicode repertoire.

The character *was* in ISO 6937 for Catalan. Noting the Catalan
association in the Unicode names list is different from any
recommendation that U+0140 is the preferred character for the
representation of l followed by a middle dot in Catalan text.
Most existing Catalan data (8859-1, Windows 1252, primarily)
would not use it, of course. Converted to Unicode, that data would
also not use it, but be represented as the sequence <006C, 00B7>.
And there is every expectation that new data created in Unicode
would continue to use such a sequence for Catalan.

--Ken

Next message: Kenneth Whistler: "Re: U+0140"
Previous message: Patrick Andries: "Re: U+0140"
Maybe in reply to: Patrick Andries: "Re: U+0140"
Next in thread: Patrick Andries: "Re: U+0140"
Reply: Patrick Andries: "Re: U+0140"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Thu Apr 15 2004 - 15:31:24 EDT