From: Mark Davis (mark.davis@jtcsv.com)
Date: Mon Jul 12 2004 - 11:18:54 CDT
> John [Cowan]'s list is not "a few characters".
Let's take Latin, for starters. There are 1870 entries in the UCA for Latin.
If you subtract from John's list the ones that are already interleaved -- as
I did in my email -- then you get 78 values, or about 4%.
I'll repeat that list again below, since it seems to have missed notice.
Now, one could argue that the letters without uppercase pairs are only used
technically (e.g. in IPA), and thus should be excluded. If so, that leaves
us with 52 (26 upper+lower), or about 3%.
If we really wanted to minimize the number of changes, then we could exclude
the ones that are for languages that rarely occur in data. I did a quick
check on http://www.eki.ee/letter/, and put what I found below. This is
*not* a complete analysis, and would need to be extended to the other
scripts, but we would then be talking about 10 letters (5 upper+lower) or
0.5% with a very restrictive list, about double that if we included a few
more.
So, yes, I do think it will probably end up being a pretty small list.
Mark
=======
Capitals by language on http://www.eki.ee/letter/
da [Danish]; fo [Faroese]; kl [Greenlandic]; no [Norwegian];
00D8; 004F; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER O WITH STROKE
01FE; 004F; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER O WITH STROKE AND
ACUTE (no information, but included for consistency with O WITH STROKE)
bs [Bosnian]; hr [Croatian]; sami1 [Inari Sámi]; sami2 [North Sámi]; sami4
[Skolt Sámi]; sl [Slovenian]; vi [Vietnamese];
0110; 0044; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER D WITH STROKE
mt [Maltese];
0126; 0048; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER H WITH STROKE
pl [Polish]; sorb1 [Lower Sorbian]; sorb2 [Upper Sorbian]; sla [Kashubian];
0141; 004C; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER L WITH STROKE
sami2 [North Sámi];
0166; 0054; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER T WITH STROKE
01E4; 0047; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER G WITH STROKE
ha [Hausa]; ff [Fula]; or bm [Bambara];
0181; 0042; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER B WITH HOOK
018A; 0044; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER D WITH HOOK
0198; 004B; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER K WITH HOOK
01B3; 0059; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER Y WITH HOOK
019D; 004E; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER N WITH LEFT HOOK
No Information
0187; 0043; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER C WITH HOOK
0191; 0046; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER F WITH HOOK
0193; 0047; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER G WITH HOOK
01A4; 0050; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER P WITH HOOK
01AC; 0054; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER T WITH HOOK
01B2; 0056; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER V WITH HOOK
0224; 005A; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER Z WITH HOOK
0197; 0049; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER I WITH STROKE
01B5; 005A; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER Z WITH STROKE
0182; 0042; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER B WITH TOPBAR
018B; 0044; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER D WITH TOPBAR
0220; 004E; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER N WITH LONG RIGHT
LEG
019F; 004F; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER O WITH MIDDLE
TILDE
01AE; 0054; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER T WITH RETROFLEX
HOOK
==============
List of items from John's list that are not already interleaved.
0181; 0042; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER B WITH HOOK
0182; 0042; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER B WITH TOPBAR
0187; 0043; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER C WITH HOOK
0110; 0044; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER D WITH STROKE
018A; 0044; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER D WITH HOOK
018B; 0044; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER D WITH TOPBAR
0191; 0046; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER F WITH HOOK
0193; 0047; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER G WITH HOOK
01E4; 0047; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER G WITH STROKE
0126; 0048; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER H WITH STROKE
0197; 0049; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER I WITH STROKE
0198; 004B; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER K WITH HOOK
0141; 004C; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER L WITH STROKE
019D; 004E; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER N WITH LEFT HOOK
0220; 004E; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER N WITH LONG RIGHT
LEG
00D8; 004F; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER O WITH STROKE
019F; 004F; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER O WITH
MIDDLETILDE
01FE; 004F; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER O WITH STROKE AND
ACUTE
01A4; 0050; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER P WITH HOOK
0166; 0054; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER T WITH STROKE
01AC; 0054; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER T WITH HOOK
01AE; 0054; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER T WITH RETROFLEX
HOOK
01B2; 0056; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER V WITH HOOK
01B3; 0059; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER Y WITH HOOK
01B5; 005A; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER Z WITH STROKE
0224; 005A; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER Z WITH HOOK
1E9A; 0061; !nfd+remove_marks; !uca #LATIN SMALL LETTER A WITH RIGHT
HALFRING
0180; 0062; !nfd+remove_marks; !uca #LATIN SMALL LETTER B WITH STROKE
0183; 0062; !nfd+remove_marks; !uca #LATIN SMALL LETTER B WITH TOPBAR
0253; 0062; !nfd+remove_marks; !uca #LATIN SMALL LETTER B WITH HOOK
0188; 0063; !nfd+remove_marks; !uca #LATIN SMALL LETTER C WITH HOOK
0255; 0063; !nfd+remove_marks; !uca #LATIN SMALL LETTER C WITH CURL
0111; 0064; !nfd+remove_marks; !uca #LATIN SMALL LETTER D WITH STROKE
018C; 0064; !nfd+remove_marks; !uca #LATIN SMALL LETTER D WITH TOPBAR
0221; 0064; !nfd+remove_marks; !uca #LATIN SMALL LETTER D WITH CURL
0256; 0064; !nfd+remove_marks; !uca #LATIN SMALL LETTER D WITH TAIL
0257; 0064; !nfd+remove_marks; !uca #LATIN SMALL LETTER D WITH HOOK
0192; 0066; !nfd+remove_marks; !uca #LATIN SMALL LETTER F WITH HOOK
01E5; 0067; !nfd+remove_marks; !uca #LATIN SMALL LETTER G WITH STROKE
0260; 0067; !nfd+remove_marks; !uca #LATIN SMALL LETTER G WITH HOOK
0127; 0068; !nfd+remove_marks; !uca #LATIN SMALL LETTER H WITH STROKE
0266; 0068; !nfd+remove_marks; !uca #LATIN SMALL LETTER H WITH HOOK
0268; 0069; !nfd+remove_marks; !uca #LATIN SMALL LETTER I WITH STROKE
029D; 006A; !nfd+remove_marks; !uca #LATIN SMALL LETTER J WITH CROSSED-TAIL
0199; 006B; !nfd+remove_marks; !uca #LATIN SMALL LETTER K WITH HOOK
0140; 006C; !nfd+remove_marks; !uca #LATIN SMALL LETTER L WITH MIDDLE DOT
0142; 006C; !nfd+remove_marks; !uca #LATIN SMALL LETTER L WITH STROKE
019A; 006C; !nfd+remove_marks; !uca #LATIN SMALL LETTER L WITH BAR
0234; 006C; !nfd+remove_marks; !uca #LATIN SMALL LETTER L WITH CURL
026B; 006C; !nfd+remove_marks; !uca #LATIN SMALL LETTER L WITH MIDDLE TILDE
026C; 006C; !nfd+remove_marks; !uca #LATIN SMALL LETTER L WITH BELT
026D; 006C; !nfd+remove_marks; !uca #LATIN SMALL LETTER L WITH RETROFLEX
HOOK
0271; 006D; !nfd+remove_marks; !uca #LATIN SMALL LETTER M WITH HOOK
019E; 006E; !nfd+remove_marks; !uca #LATIN SMALL LETTER N WITH LONG
RIGHTLEG
0235; 006E; !nfd+remove_marks; !uca #LATIN SMALL LETTER N WITH CURL
0272; 006E; !nfd+remove_marks; !uca #LATIN SMALL LETTER N WITH LEFT HOOK
0273; 006E; !nfd+remove_marks; !uca #LATIN SMALL LETTER N WITH RETROFLEX
HOOK
00F8; 006F; !nfd+remove_marks; !uca #LATIN SMALL LETTER O WITH STROKE
01FF; 006F; !nfd+remove_marks; !uca #LATIN SMALL LETTER O WITH STROKE AND
ACUTE
01A5; 0070; !nfd+remove_marks; !uca #LATIN SMALL LETTER P WITH HOOK
02A0; 0071; !nfd+remove_marks; !uca #LATIN SMALL LETTER Q WITH HOOK
027C; 0072; !nfd+remove_marks; !uca #LATIN SMALL LETTER R WITH LONG LEG
027D; 0072; !nfd+remove_marks; !uca #LATIN SMALL LETTER R WITH TAIL
0282; 0073; !nfd+remove_marks; !uca #LATIN SMALL LETTER S WITH HOOK
0167; 0074; !nfd+remove_marks; !uca #LATIN SMALL LETTER T WITH STROKE
01AB; 0074; !nfd+remove_marks; !uca #LATIN SMALL LETTER T WITH PALATAL HOOK
01AD; 0074; !nfd+remove_marks; !uca #LATIN SMALL LETTER T WITH HOOK
0236; 0074; !nfd+remove_marks; !uca #LATIN SMALL LETTER T WITH CURL
0288; 0074; !nfd+remove_marks; !uca #LATIN SMALL LETTER T WITH RETROFLEX
HOOK
028B; 0076; !nfd+remove_marks; !uca #LATIN SMALL LETTER V WITH HOOK
01B4; 0079; !nfd+remove_marks; !uca #LATIN SMALL LETTER Y WITH HOOK
01B6; 007A; !nfd+remove_marks; !uca #LATIN SMALL LETTER Z WITH STROKE
0225; 007A; !nfd+remove_marks; !uca #LATIN SMALL LETTER Z WITH HOOK
0290; 007A; !nfd+remove_marks; !uca #LATIN SMALL LETTER Z WITH RETROFLEX
HOOK
0291; 007A; !nfd+remove_marks; !uca #LATIN SMALL LETTER Z WITH CURL
025A; 0259; !nfd+remove_marks; !uca #LATIN SMALL LETTER SCHWA WITH HOOK
0286; 0283; !nfd+remove_marks; !uca #LATIN SMALL LETTER ESH WITH CURL
01BA; 0292; !nfd+remove_marks; !uca #LATIN SMALL LETTER EZH WITH TAIL
0293; 0292; !nfd+remove_marks; !uca #LATIN SMALL LETTER EZH WITH CURL
Mark
----- Original Message -----
From: "Michael Everson" <everson@evertype.com>
To: <unicode@unicode.org>
Sent: Saturday, July 10, 2004 04:20
Subject: Re: Changing UCA primary weights (bad idea)
> At 17:34 -0700 2004-07-09, Mark Davis wrote:
>
> >What I think we should be examining is which of the items that are not
> >interfiled (to use your phrasing) should be, if any. I don't think
> >everything should be. In particular, I think John's list is the list we
> >should be focusing on.
>
> I think most of what is in John [Cowan]'s list
> are letters which are quite properly not
> interfiled with "base" letters. The African hook
> letters (which I have mentioned many times, and
> which you have ignored in favour of the Danish
> letters you are more familiar with) are there.
>
> > > John's list?
> >
> >That's was in my original mail, that you were commenting on when you
changed
> >the subject line, but which you didn't apparently didn't bother to
actually
> >read.
>
> Sweet of you to say.
>
> > > My point is made here. It is really only in
> >> initial position where this is likely to be
> >> noticed.
> >
> >This is incorrect. It will make a difference in other positions. Sorting
> >"Søren" after "Sozar" in a long list, if someone isn't expecting it, will
> >cause problems. They look for it after "Soret", don't see it on the page,
> >and assume it isn't there; fooled by the fact that it is on a completely
> >different page.
>
> No way! Do you expect your default tailorable
> template to suddenly and magically relieve the
> user of the problems of long lists and multi-page
> typesetting? Sheesh. No matter how much you
> jiggle either the template or a tailoring for
> people who only know the letters A-Z, there will
> be edge cases which will fail this kind of test.
>
> >Remember that the collation sequence is also used for language-sensitive
> >matching as well as sorting.
>
> I remember.
>
> > > What I want is the status quo, however.
> >> Leave the template and its principles alone.
> >
> >Stability is important, and we want to consider that very carefully
before
> >making any change. However, I believe that the current way we handle a
few
> >characters in UCA is distinctly suboptimal, and worth considering.
>
> John [Cowan]'s list is not "a few characters".
> --
> Michael Everson * * Everson Typography * * http://www.evertype.com
>
>
>
This archive was generated by hypermail 2.1.5 : Mon Jul 12 2004 - 11:19:55 CDT