Re: Case mapping

From: Mark Davis (
Date: Sat May 06 2000 - 19:05:06 EDT

Those are some good comments pn You might also be interested in a file I put up to help visualize the relationships with case folding. It is in draft form now and not 'public', but comments are welcome. See

See below for some responses to your message.


Patrick Andries wrote:

> I have a few questions regarding TR21 (just trying to grasp).
> 1) Why is the titlecase form for 0149 ('n) the decomposed 02BC 006E (' + n)
> ?
> See <Unicode 3.0 CD>/Unidata/SpecialCasing-2.txt. Why could it not be 0149 ?

This is a bug: it should be:


> a) Because 0149 is not considered as a single letter ?
> b) Could it be because if the lowercase letter were to equal the
> titlecase letter, the string beginning by that character can no longer be
> detected as lowercase (see 2.2) or even titlecase (any lowercase letters
> must follow cased characters ) given the current definitions ?
> 2) Can the Afrikaans titlecase word « ' + n » (indefinite article « a ») be
> detected as a titlecased ?
> In other words (in pseudo-code), isTitleCase(toTitleCase("\u0149")) == true
> ?
> I believe not. Is it important ?

No, it wouldn't be.

> 3) I wonder if some subtlety has not escaped me in the following description
> :
> «Detecting Titlecase
> A string is titlecase if all four of the following conditions are true:
> a.. there is at least one cased character in the string
> b.. there are no distinct-uppercase (Lud) characters
> c.. any lowercase letters must follow cased characters
> d.. there are no titlecase or uppercase letters, except following uncased
> characters »

You are right. Probably clearest would be:

d.. no titlecase or uppercase letters follow cased characters

> Would it not be clearer if the last part had a « or at the beginning of the
> string » appended to it ? As far as I understand, the string may contain an
> uppercase letter and no uncased characters and still be titlecase.
> 4) Though I do not believe there is any mention of « sentence casing » in
> TR21, curious readers may be interested in noticing that in Afrikaans, when
> a sentence begins with an «'n», the next word is Titlecased (see
> I do not know whether this merits mentioning
> anywhere in the technical reports, but the naïve approach of casing
> sentences (i.e applying toTitleCase() to the first word) will therefore not
> work under some locales.

Yes, there should be a note to that effect. In general the casing of sentences and titles will be language dependent. In another example, "Taming of the Shrew" would be the appropriate capitalization for a title in English.

> Patrick Andries
> Dorval (Québec)

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:02 EDT