Yes, changes in size when casing can happen for many reasons.
1. The case conversion causes an expansion. For example:
0149; 0149; 02BC 006E; 02BC 006E; # LATIN SMALL LETTER N PRECEDED BY
1F80; 1F80; 1F88; 1F00 03B9; # GREEK SMALL LETTER ALPHA WITH PSILI AND
2. The case mapping crosses a UTF-8 size boundary. These boundaries are
at 7F, 3FF, FFFF.
0049; 0049; 0131; 0131; tr; # LATIN CAPITAL LETTER I (in Turkish)
3. You can also get shrinkage in UTF-8, because of boundary crossing!
017F;LATIN SMALL LETTER LONG S;Ll;0;L;<compat> 0073;;;;N;;;0053;;0053
4. By chance, 00DF (es-zed) happens to expand because of character
expansion, and contract (in UTF-8) because of boundary crossing, thus
ending up with the same number of bytes!
5. Look at these files for examples; future versions of the standard may
add other examples as well.
BTW, there were some production problems with SpecialCasing.txt that
resulted in some bad mappings. This will be corrected soon.
Hallvard B Furuseth wrote:
> Can a UTF-8 string ever become longer when it's converted to upper- or
-- business: firstname.lastname@example.org, email@example.com personal: firstname.lastname@example.org, http://www.macchiato.com --
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:45 EDT