RE: Titlecasing words starting with numeric glyphs and period as word separator

From: Koji Ishii (kojiishi@gluesoft.co.jp)
Date: Tue Feb 22 2011 - 22:14:01 CST

Next message: Mark Davis ☕: "Re: Titlecasing words starting with numeric glyphs and period as word separator"

Previous message: Asmus Freytag: "Re: [unicode] Re: UTF-c"
In reply to: Mark Davis ☕: "Re: Titlecasing words starting with numeric glyphs and period as word separator"
Next in thread: Mark Davis ☕: "Re: Titlecasing words starting with numeric glyphs and period as word separator"
Reply: Mark Davis ☕: "Re: Titlecasing words starting with numeric glyphs and period as word separator"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Thank you Mark for leading me.

I apologize any brashness, as I’m new to here.

I didn’t write what I want very clearly, I’m sorry about that, but all I want for now is just to present what were talked at CSS, listen to what people here would say, and hopefully have some discussions.

I’m not sure if I want it be on the next agenda at this point, but I’ll follow your instructions if I want to.

Regards,
Koji

From: mark.edward.davis@gmail.com [mailto:mark.edward.davis@gmail.com] On Behalf Of Mark Davis ?
Sent: Tuesday, February 22, 2011 4:56 PM
To: Koji Ishii
Cc: unicode@unicode.org
Subject: Re: Titlecasing words starting with numeric glyphs and period as word separator

The default Unicode rules cannot cover all languages or circumstances properly. It is worth bringing up to the Unicode technical committee any proposals (and/or problem cases) with the default rules, but bear in mind that those default rules will never be able to cover all languages well. Acronyms, hyphenations, and contractions present particular problems: there are some notes on some of them in http://www.unicode.org/reports/tr29/.

You can have discussions here or on the http://unicode.org/forum/, but to get on the next agenda (May) for the UTC, make sure that there is a proposal filed by a member or by you on http://www.unicode.org/reporting.html.

> "word separating rules optimized for titlecasing" could be slightly different from general word separating rules

Language-specific rules such as for titlecasing, fall under the CLDR technical committee<http://cldr.unicode.org/>. There have been tickets filed for adding structure and data for language-specific titlecasing some time ago, but it hadn't reached a high enough relative priority for the committee to work on. Having such "word separating rules optimized for titlecasing" was the direction the committee was thinking of. I put it on the agenda for the next CLDR meeting (that committee meets weekly by phone), and you can file a ticket with additional information and/or example problem cases that you'd like to see handled: http://unicode.org/cldr/trac/newticket

Mark

— Il meglio è l’inimico del bene —

On Mon, Feb 21, 2011 at 23:15, Koji Ishii <kojiishi@gluesoft.co.jp<mailto:kojiishi@gluesoft.co.jp>> wrote:
Hello,

There's a discussion going on in W3C CSS mailing list[1] about specifications of the text-transform property[2], specifically how the "capitalize" value that titlecase specified span of text.

During the discussion, two cases were presented:

1. Titlecasing words starting with numeric glyphs (e.g., "99ers") can be "99Ers" if we follow the rules defined in 5.18 Case Mappings. Is this discussed here and it's up to implementations to define which words to apply titlecasing, or should this be fixed in Unicode spec?

2. We're thinking to use UAX #24 to separate words and then apply Titlecase_Mapping to every word. But doing so makes "a.m." to be "A.m." and it contradicts with the general publication rules[3]. While I understand both separating words and titlecasing are ambiguous, cannot be perfect, and we must make compromises. But since Unicode defines these two rules separately, I guess there's a possibility that "word separating rules optimized for titlecasing" could be slightly different from general word separating rules. I haven't thought much about counter-cases for not doing so, but I wonder if anyone in this ML could have idea including whether we should do it or not, or we should include more other cases.

Any feedback is greatly appreciated.

Regards,
Koji

[1] http://lists.w3.org/Archives/Public/www-style/2011Feb/0621.html
[2] http://dev.w3.org/csswg/css3-text/#text-transform
[3] http://www.businesswritingblog.com/business_writing/2009/06/what-is-the-correct-time-am-pm-am-pm-am-pm-.html

Next message: Mark Davis ☕: "Re: Titlecasing words starting with numeric glyphs and period as word separator"
Previous message: Asmus Freytag: "Re: [unicode] Re: UTF-c"
In reply to: Mark Davis ☕: "Re: Titlecasing words starting with numeric glyphs and period as word separator"
Next in thread: Mark Davis ☕: "Re: Titlecasing words starting with numeric glyphs and period as word separator"
Reply: Mark Davis ☕: "Re: Titlecasing words starting with numeric glyphs and period as word separator"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Tue Feb 22 2011 - 22:19:45 CST