First of all, apologies for the attachment problem. This is the
first time I've wanted to distribute a file to this list, and
file size limitations never even dawned on my few grey cells.
So, the file in question is available at
Secondly, Kent wrote:
>SpecialCasing.txt (and UTR 21) appear to allow mapping to
>"not immediately related" characters. Maybe the ones you
>enumerate should be added to SpecialCasing.txt?
If U-bar (e.g.) is encoded as 0055 + 0335, then of course this
would be a special case of casing (no pun intended). The text
of UTR-21 says
Case mappings may produce strings of different length than the
example, the German character 00DF "á" small letter sharp s
uppercased to the sequence of two characters "SS". This also
there is no precomposed character corresponding to a case
The last sentence is applicable here: there are no precomposed
upper case characters corresponding to
U+026B LATIN SMALL LETTER L WITH MIDDLE TILDE
U+0289 LATIN SMALL LETTER U BAR
U+019A LATIN SMALL LETTER L WITH BAR
However, when I look in SpecialCasing.txt, all of the Latin
characters of this sort that are handled in this file have a
canonical or compatibility decomposition. The ones that I've
mentioned do not. I didn't know if that mattered at all in
relation to normalizations. This was my question #2. The
corollary question is whether or not there is a need for LATIN
CAPITAL LETTER U BAR, etc.
I suppose that the greater likelihood of problems would lie in
the instances where one case had a decomposition but the other
case did not. I was just hoping to get some authoritative
feedback to educate me about such matters.
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:59 EDT