UCD.html and simple titlecase

From: Martin v. Löwis (martin@v.loewis.de)
Date: Sat Jan 17 2009 - 12:21:15 CST

  • Next message: Doug Ewell: "Re: UCD.html and simple titlecase"

    Currently, UCD.html says about Simple_Titlecase_Mapping

    Note: The simple titlecase may be omitted in the data file if the
    titlecase is the same as the uppercase.

    I think this note disagrees with the current UnicodeData.txt.

    For example, UnicodeData has

    01C5;LATIN CAPITAL LETTER D WITH SMALL LETTER Z WITH
    CARON;Lt;0;L;<compat> 0044 017E;;;;N;LATIN LETTER CAPITAL D SMALL Z
    HACEK;;01C4;01C6;

    So we have:
    - upper case: U+01C4
    - lower case: U+01C6
    - title case: omitted, hence the same as uppercase, hence U+01C4

    I think this is surprising: U+01C5 is already a titlecase letter,
    so its simple titlecase should be U+01C5.

    To fix this, I think one would either have to
    a) change UCD.html, to adjust the Note to
       The simple titlecase is omitted in the data file if the titlecase is
    the same as the code point itself,
    or
    b) change UnicodeData.txt to explicitly list the titlecase mapping
       for titlecase characters as the character itself.

    What do you think?

    Regards,
    Martin



    This archive was generated by hypermail 2.1.5 : Sat Jan 17 2009 - 12:39:40 CST