Re: Titlecasing words starting with numeric glyphs and period as word separator

From: Mark Davis ☕ (
Date: Tue Feb 22 2011 - 01:56:17 CST

  • Next message: William_J_G Overington: "Re: [unicode] UTF-c"

    The default Unicode rules cannot cover all languages or circumstances
    properly. It is worth bringing up to the Unicode technical committee any
    proposals (and/or problem cases) with the default rules, but bear in mind
    that those default rules will never be able to cover all languages
    well. Acronyms,
    hyphenations, and contractions present particular problems: there are some
    notes on some of them in

    You can have discussions here or on the, but to
    get on the next agenda (May) for the UTC, make sure that there is a proposal
    filed by a member or by you on

    > "word separating rules optimized for titlecasing" could be slightly
    different from general word separating rules

    Language-specific rules such as for titlecasing, fall under the CLDR
    technical committee <>. There have been tickets
    filed for adding structure and data for language-specific titlecasing some
    time ago, but it hadn't reached a high enough relative priority for the
    committee to work on. Having such "word separating rules optimized for
    titlecasing" was the direction the committee was thinking of. I put it on
    the agenda for the next CLDR meeting (that committee meets weekly by phone),
    and you can file a ticket with additional information and/or example problem
    cases that you'd like to see handled:


    *— Il meglio è l’inimico del bene —*

    On Mon, Feb 21, 2011 at 23:15, Koji Ishii <> wrote:

    > Hello,
    > There's a discussion going on in W3C CSS mailing list[1] about
    > specifications of the text-transform property[2], specifically how the
    > "capitalize" value that titlecase specified span of text.
    > During the discussion, two cases were presented:
    > 1. Titlecasing words starting with numeric glyphs (e.g., "99ers") can be
    > "99Ers" if we follow the rules defined in 5.18 Case Mappings. Is this
    > discussed here and it's up to implementations to define which words to apply
    > titlecasing, or should this be fixed in Unicode spec?
    > 2. We're thinking to use UAX #24 to separate words and then apply
    > Titlecase_Mapping to every word. But doing so makes "a.m." to be "A.m." and
    > it contradicts with the general publication rules[3]. While I understand
    > both separating words and titlecasing are ambiguous, cannot be perfect, and
    > we must make compromises. But since Unicode defines these two rules
    > separately, I guess there's a possibility that "word separating rules
    > optimized for titlecasing" could be slightly different from general word
    > separating rules. I haven't thought much about counter-cases for not doing
    > so, but I wonder if anyone in this ML could have idea including whether we
    > should do it or not, or we should include more other cases.
    > Any feedback is greatly appreciated.
    > Regards,
    > Koji
    > [1]
    > [2]
    > [3]

    This archive was generated by hypermail 2.1.5 : Tue Feb 22 2011 - 01:59:31 CST