Re: Titlecasing words starting with numeric glyphs and period as word separator

From: Mark Davis ☕ (mark@macchiato.com)
Date: Wed Feb 23 2011 - 09:57:39 CST

  • Next message: Mark Rosa: "Re: Kaida font (work in progress)"

    I didn't take what you said as at all brash - you and others at CSS are
    looking for a solution to your issue, and there is no reason for you to know
    the structure and process used in the Unicode Consortium. Such a solution
    could involve use of structure and properties already defined (by the UTC
    and CLDR-TC), or result in improvements or extensions to those structures.

    I should have also mentioned that the W3C has a liaison relationship with
    the Unicode Consortium, and you can also work through knowledgeable people
    in the i18n group in the W3C, such as Richard Ishida and Addison Phillips.

    Mark

    *— Il meglio è l’inimico del bene —*

    On Tue, Feb 22, 2011 at 20:14, Koji Ishii <kojiishi@gluesoft.co.jp> wrote:

    > Thank you Mark for leading me.
    >
    >
    >
    > I apologize any brashness, as I’m new to here.
    >
    >
    >
    > I didn’t write what I want very clearly, I’m sorry about that, but all I
    > want for now is just to present what were talked at CSS, listen to what
    > people here would say, and hopefully have some discussions.
    >
    >
    >
    > I’m not sure if I want it be on the next agenda at this point, but I’ll
    > follow your instructions if I want to.
    >
    >
    >
    >
    >
    > Regards,
    >
    > Koji
    >
    >
    >
    > *From:* mark.edward.davis@gmail.com [mailto:mark.edward.davis@gmail.com] *On
    > Behalf Of *Mark Davis ?
    > *Sent:* Tuesday, February 22, 2011 4:56 PM
    > *To:* Koji Ishii
    > *Cc:* unicode@unicode.org
    > *Subject:* Re: Titlecasing words starting with numeric glyphs and period
    > as word separator
    >
    >
    >
    > The default Unicode rules cannot cover all languages or circumstances
    > properly. It is worth bringing up to the Unicode technical committee any
    > proposals (and/or problem cases) with the default rules, but bear in mind
    > that those default rules will never be able to cover all languages well. Acronyms,
    > hyphenations, and contractions present particular problems: there are some
    > notes on some of them in http://www.unicode.org/reports/tr29/.
    >
    >
    >
    > You can have discussions here or on the http://unicode.org/forum/, but to
    > get on the next agenda (May) for the UTC, make sure that there is a proposal
    > filed by a member or by you on http://www.unicode.org/reporting.html.
    >
    >
    >
    > > "word separating rules optimized for titlecasing" could be slightly
    > different from general word separating rules
    >
    >
    >
    > Language-specific rules such as for titlecasing, fall under the CLDR
    > technical committee <http://cldr.unicode.org/>. There have been tickets
    > filed for adding structure and data for language-specific titlecasing some
    > time ago, but it hadn't reached a high enough relative priority for the
    > committee to work on. Having such "word separating rules optimized for
    > titlecasing" was the direction the committee was thinking of. I put it on
    > the agenda for the next CLDR meeting (that committee meets weekly by phone),
    > and you can file a ticket with additional information and/or example problem
    > cases that you'd like to see handled:
    > http://unicode.org/cldr/trac/newticket
    >
    >
    >
    > Mark
    >
    > *— Il meglio è l’inimico del bene —*
    >
    > On Mon, Feb 21, 2011 at 23:15, Koji Ishii <kojiishi@gluesoft.co.jp> wrote:
    >
    > Hello,
    >
    > There's a discussion going on in W3C CSS mailing list[1] about
    > specifications of the text-transform property[2], specifically how the
    > "capitalize" value that titlecase specified span of text.
    >
    > During the discussion, two cases were presented:
    >
    > 1. Titlecasing words starting with numeric glyphs (e.g., "99ers") can be
    > "99Ers" if we follow the rules defined in 5.18 Case Mappings. Is this
    > discussed here and it's up to implementations to define which words to apply
    > titlecasing, or should this be fixed in Unicode spec?
    >
    > 2. We're thinking to use UAX #24 to separate words and then apply
    > Titlecase_Mapping to every word. But doing so makes "a.m." to be "A.m." and
    > it contradicts with the general publication rules[3]. While I understand
    > both separating words and titlecasing are ambiguous, cannot be perfect, and
    > we must make compromises. But since Unicode defines these two rules
    > separately, I guess there's a possibility that "word separating rules
    > optimized for titlecasing" could be slightly different from general word
    > separating rules. I haven't thought much about counter-cases for not doing
    > so, but I wonder if anyone in this ML could have idea including whether we
    > should do it or not, or we should include more other cases.
    >
    > Any feedback is greatly appreciated.
    >
    >
    > Regards,
    > Koji
    >
    > [1] http://lists.w3.org/Archives/Public/www-style/2011Feb/0621.html
    > [2] http://dev.w3.org/csswg/css3-text/#text-transform
    > [3]
    > http://www.businesswritingblog.com/business_writing/2009/06/what-is-the-correct-time-am-pm-am-pm-am-pm-.html
    >
    >
    >



    This archive was generated by hypermail 2.1.5 : Wed Feb 23 2011 - 10:03:19 CST