Those are some good comments pn ftp://ftp.unicode.org/Public/UNIDATA/SpecialCasing.txt. You might also be interested in a file I put up to help visualize the relationships with case folding. It is in draft form now and not 'public', but comments are welcome. See http://www.unicode.org/unicode/reports/tr21/CaseFolding.html.
See below for some responses to your message.
Patrick Andries wrote:
> I have a few questions regarding TR21 (just trying to grasp).
> 1) Why is the titlecase form for 0149 ('n) the decomposed 02BC 006E (' + n)
> See <Unicode 3.0 CD>/Unidata/SpecialCasing-2.txt. Why could it not be 0149 ?
This is a bug: it should be:
0149; 0149; 02BC 004E; 02BC 004E; # LATIN SMALL LETTER N PRECEDED BY APOSTROPHE
> a) Because 0149 is not considered as a single letter ?
> b) Could it be because if the lowercase letter were to equal the
> titlecase letter, the string beginning by that character can no longer be
> detected as lowercase (see 2.2) or even titlecase (any lowercase letters
> must follow cased characters ) given the current definitions ?
> 2) Can the Afrikaans titlecase word « ' + n » (indefinite article « a ») be
> detected as a titlecased ?
> In other words (in pseudo-code), isTitleCase(toTitleCase("\u0149")) == true
> I believe not. Is it important ?
No, it wouldn't be.
> 3) I wonder if some subtlety has not escaped me in the following description
> «Detecting Titlecase
> A string is titlecase if all four of the following conditions are true:
> a.. there is at least one cased character in the string
> b.. there are no distinct-uppercase (Lud) characters
> c.. any lowercase letters must follow cased characters
> d.. there are no titlecase or uppercase letters, except following uncased
> characters »
You are right. Probably clearest would be:
d.. no titlecase or uppercase letters follow cased characters
> Would it not be clearer if the last part had a « or at the beginning of the
> string » appended to it ? As far as I understand, the string may contain an
> uppercase letter and no uncased characters and still be titlecase.
> 4) Though I do not believe there is any mention of « sentence casing » in
> TR21, curious readers may be interested in noticing that in Afrikaans, when
> a sentence begins with an «'n», the next word is Titlecased (see
> http://hapax.iquebec.com). I do not know whether this merits mentioning
> anywhere in the technical reports, but the naïve approach of casing
> sentences (i.e applying toTitleCase() to the first word) will therefore not
> work under some locales.
Yes, there should be a note to that effect. In general the casing of sentences and titles will be language dependent. In another example, "Taming of the Shrew" would be the appropriate capitalization for a title in English.
> Patrick Andries
> Dorval (Québec)
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:02 EDT