From: Markus Scherer (markus.icu@gmail.com)
Date: Fri Mar 25 2005 - 13:35:50 CST
On Thu, 24 Mar 2005 15:32:09 +0100, Theo Veenker <Theo.Veenker@let.uu.nl> wrote:
> The descriptions for Final_Sigma and Before_Dot are clear to me. For
> After_Soft_Dotted, More_Above and After_I don't see how the descriptions
> and the regexps represent *exactly* the same thing. For these I don't
> see the \p{cc=0} parts reflected in the descriptions. Also isn't the
> After_I regexp missing a "*"?
You mention another typo below. Looking at the current version, I
don't see the typos, and I see phrases like "with no intervening
character of type 0" corresponding to the \p{cc=0} parts.
http://www.unicode.org/versions/Unicode4.1.0/
> The functions below represent what I make of the descriptions and the
> regexps. Are they correct?
I just took a very brief look at some of them, and they look ok. Feel
free to compare with my implementation in ICU. In the current version,
it's in ucase.c, roughly the second half of the file.
Our site recently moved - WebCVS has this currently at
http://dev.icu-project.org/cgi-bin/viewcvs.cgi/*checkout*/icu/source/common/ucase.c
but WebCVS may move once more. You can also just download ICU 3.2 or
use anonymous CVS. See http://www.ibm.com/software/globalization/icu/
In older ICU releases, very similar code was in uchar.c. The code
comments quote an older Unicode version, but the conditions have not
substantially changed since then.
markus
This archive was generated by hypermail 2.1.5 : Fri Mar 25 2005 - 13:37:44 CST