Re: Question about “Uppercase” in DerivedCoreProperties.txt

From: Mike FABIAN <>
Date: Sat, 08 Nov 2014 10:22:10 +0100

Philippe Verdy <> さんはかきました:

> note that tolower() and toupper() can only work one 1-character level, it
> is not recommended for use for changing case of plain text.
> For correct handling of locales, to upper and toupper should be replaced by
> strtolower and strtoupper (or their aliases) which will be able to process
> character clusters and contextual casing rules needed for a language or
> orthographic style

Yes, thank you for explaining this.

But these details of upper and lower casing cannot be expressed in the
“i18n” file of glibc:;a=blob;f=localedata/locales/i18n

For toupper and tolower, this file just has character -> character
mapping tables, for example the “tolower” table contains only


(i.e. mapping Σ U+03A3 -> σ U+03C3, never to the final sigma ς

More correct, detailed information about upper and lower case must come
from elsewhere, not from this “i18n” file in glibc. Using only the
information from this “i18n” file, not even the Greek sigma can be
handled correctly.

Pravin and me want to update this “i18n” file to the latest
data from Unicode 7.0.0, doing it as correct as possible within
the limitations caused by this file and the ISO C standard.

Mike FABIAN <>
☏ Office: +49-69-365051027, internal 8875027
Unicode mailing list
Received on Sat Nov 08 2014 - 03:23:24 CST

This archive was generated by hypermail 2.2.0 : Sat Nov 08 2014 - 03:23:25 CST