RE: Titlecasing words starting with numeric glyphs and period as word separator

From: Shawn Steele (Shawn.Steele@microsoft.com)
Date: Tue Mar 01 2011 - 17:01:54 CST

  • Next message: Stephan Stiller: "Re: Titlecasing words starting with numeric glyphs and period as word separator"

    Title casing is very language-specific, and, as noted below, what your English teacher expects likely isn’t what your average programmer thinks of when they think of title casing.

    - Shawn

     
    http://blogs.msdn.com/shawnste
    Selfhost a custom locale from \\scratch2\scratch\shawnste\customlocaledrop\install.bat<file:///\\scratch2\scratch\shawnste\customlocaledrop\install.bat>
    (Selfhost 7929)

    From: unicode-bounce@unicode.org [mailto:unicode-bounce@unicode.org] On Behalf Of CE Whitehead
    Sent: Tuesday, March 01, 2011 1:33 PM
    To: kojiishi@gluesoft.co.jp; unicode@unicode.org
    Subject: Titlecasing words starting with numeric glyphs and period as word separator

    Hi, Koji:

    First I would say "99ers" not "99Ers" -- I cannot imagine any case at all for "99Ers"
    (see http://www.google.com/#sclient=psy&hl=en&q=49ers&aq=0&aqi=g5&aql=f&oq=49ers&pbx=1&bav=on.1,or.&fp=42ea6e12edc6080 online examples with 49ers ;
    for online examples with 49ers ;
    but feel free to submit a question about this to the Chicago Manual of Style:
    http://www.chicagomanualofstyle.org/QA_submit.html).

    For your rules for text transformation in css (http://dev.w3.org/csswg/css3-text/#text-transform)
    I would limit setting rules for titlecasing, that is I might specify for that nouns, adjectives, adverbs, pronouns should be capitalized in English titles, but would not specify other more "fuzzy" rules.

    The only rule needed for title casing A.M/a.m. ; AM/a.m. and P.M./p.m. ; PM/pm that I can surmise is that both the "a" and "m" or "p" and "m" need to match (that is if you title case the "a" you have to title case the "m" -- so I would not be happy with P.m.).

    Also, as far as I know, there should be no fixed rule about title casing English prepositions less than four letters long and that are not the first word in an English title ("of" or "Of," "to" or "To," "in" or "In," "on" or "On") and perhaps no rule even for English conjunctions of less than four letters ("and" or "And," "or" or "Or," "but" or "But") although in the case of conjunctions I prefer lower case unless the conjunction is the first word in a title. Also I would never capitalize "of" in a title unless it were the first word.

    (My way for title-casing the title in English of a book, article, journal is:
    First letter of first word = capital 1rst letter
    Nouns, Adjectives, Adverbs, Pronouns = capital 1rst letter
    Prepositions and Conjunctions and of 4 letter or more in length = capital 1rst letter
    This, That, These, Those (determiners) = capital 1rst letter
    Conjunctions of less than 4 letters in length = lower case
    Prepositions of less than 4 letters in length = lower case

    And as the Chicago Manual of Style says, in a title, one of the above following a hyphen gets its first letter treated just as if it followed white space:
    http://www.chicagomanualofstyle.org/CMS_FAQ/CapitalizationTitles/CapitalizationTitles22.html

    I don't know what to do with "etc" in a title but would probably capitalize it:
    http://www.chicagomanualofstyle.org/CMS_FAQ/Capitalization/Capitalization11.html

    However, what I would do with prepositions and conjunctions of 4 letter or more deviates slightly from the rules I read in the Chicago Manual of Style's info pages:
    http://www.chicagomanualofstyle.org/CMS_FAQ/CapitalizationTitles/CapitalizationTitles04.html
    but see also:
    http://www.chicagomanualofstyle.org/CMS_FAQ/CapitalizationTitles/CapitalizationTitles12.html);
    The Purdue Owl agrees with me -- that the short prepositions and conjunctions should not be capitalized in English:
    http://owl.english.purdue.edu/engagement/index.php?category_id=2&sub_category_id=1&article_id=42 

    For the complete list of questions already asked about titles at Chicago Manual of Style, go to:
    http://www.chicagomanualofstyle.org/CMS_FAQ/CapitalizationTitles/CapitalizationTitles_questions01.html)


    I wonder if, in some cases, a "fuzzy logic" solution might be the solution that is needed for titles (if it could be done without using much bandwidth).

    In any case, I would let the browser and application developers conduct statistical analysis for things like English prepositions and "etc" in titles;
    also if you'd like brief info. on fuzzy logic, see:
    Kumar and Garg. "Intelligent Learning of Fuzzy Logic Controllers Via Neural Network and Genetic Algorithm." Duke University.
    http://www.duke.edu/~manish/UL_029.pdf (This is a pretty brief reference.)

    Best,

    --C. E. Whitehead
    cewcathar@hotmail.com<mailto:cewcathar@hotmail.com>


    From: Koji Ishii (kojiishi@gluesoft.co.jp<mailto:kojiishi@gluesoft.co.jp?Subject=Re:%20Titlecasing%20words%20starting%20with%20numeric%20glyphs%20and%20period%20as%20word%20separator>)
    Date: Tue Feb 22 2011 - 01:15:46 CST


    ________________________________


    > Hello,
    > There's a discussion going on in W3C CSS mailing list[1] about specifications of the text-transform
    > property[2], specifically how the "capitalize" value that titlecase specified span of text.
    > During the discussion, two cases were presented:
    > 1. Titlecasing words starting with numeric glyphs (e.g., "99ers") can be "99Ers" if we follow the rules
    > defined in 5.18 Case Mappings. Is this discussed here and it's up to implementations to define which > words to apply titlecasing, or should this be fixed in Unicode spec?
    > 2. We're thinking to use UAX #24 to separate words and then apply Titlecase_Mapping to every word. > But doing so makes "a.m." to be "A.m." and it contradicts with the general publication rules[3]. While > I understand both separating words and titlecasing are ambiguous, cannot be perfect, and we must
    > make compromises. But since Unicode defines these two rules separately, I guess there's a possibility
    > that "word separating rules optimized for titlecasing" could be slightly different from general word
    > separating rules. I haven't thought much about counter-cases for not doing so, but I wonder if anyone
    > in this ML could have idea including whether we should do it or not, or we should include more other
    > cases.
    > Any feedback is greatly appreciated.

    I just note that sometimes inside English titles prepositions begin with a capital letter and sometimes not thus for some parts of speech in titles "fuzzy logic" might work better than rules;

    I think you can have PM/AM or pm/am or P.M./A.M. or p.m./a.m. too.

    Thus restrict rules to noun, verbs, adjectives for English; and longer prepositions and relativizers;
    for other languages the rules are different. So I am just saying limit title casing rules to where there is no variation and leave the rest to developers to implement using fuzzy logic maybe.

    Best,

    --C. E. Whitehead
    cewcathar@hotmail.com<mailto:cewcathar@hotmail.com>


    > Regards,
    > Koji




    This archive was generated by hypermail 2.1.5 : Tue Mar 01 2011 - 17:05:31 CST