RE: Titlecasing words starting with numeric glyphs and period as word separator

From: CE Whitehead (
Date: Sun Mar 06 2011 - 12:18:46 CST

  • Next message: Andreas Stötzner: "Re: Supposedly the Dresden codex has been cracked?"

    Hi, I did finally find a few online sources on French titles (but now tend to agree that just capitalizing the first word for Latin and Cyrillic is the best solution for now; so the information may not be that helpful; but it's here for anyone interested; it's easier I think to get the rules for English titles and perhaps these could be fixed in the CLDR database -- if you can fix a rule for recognizing a title; I think it's easier to fix a rule for recognizing a first word in a sentence or title . . . which Mark said might need discussion ):

    According to wordreference's forum discussion:
    "The answer is fairly simple: don't capitalize words following a colon, semicolon or comma.
    "Les Ordinateurs et l'HumanitĂ©: une guerre pour Sion ?"

    On the other hand as Phillipe noted words joined by "et" or "ou" ('and' or 'or') are capitalized.
    ( A post here points out that rules were more like English before the 19th century)

    The above site has information about the differences in capitalizing titles that are complete sentences or have a conjugated verb phrase and capitals that are not.


    This site provides information about the use of capital letters in words that refer to places, such as "Ouest" or "Occident" ('West'), "Orient" ('East', 'Orient'); but "Sud-Est asiatique" ('Southeast Asia' because in French the reference to Asia is not a noun but an adjective; literally the phrase reads, '[the] Asiatic Southeast'); but "l'ouest de la France" ('the west of France'), "lac" ('lake'), etc., and surprisingly, "l'ocean Pacifique" (why 'Pacific' the adjective here gets capitalized but not the noun 'ocean'? 'Pacific' is the word that distinguishes the place while 'ocean' is just a common noun, but not 'Ile-', 'Isle-' in the hyphenated phrase "Ile-de-France").
    And of course as Phillipe pointed out in an off-list email, you capitalize "Les Francais" and "Les Belges," but not the word "francais" in "les Belges français."

    So getting French place names right may be out of scope for unicode/CLDR.

    > Date: Thu, 3 Mar 2011 02:01:41 +0100
    > Subject: Re: Titlecasing words starting with numeric glyphs and period as word separator
    > From:
    > To:
    > CC:;;;;
    > 2011/3/2 Mark Davis ☕ <>:
    > > I have a typo in the following. Should have written:
    > > l’histoire du QuĂ©bec => L’histoire du QuĂ©bec
    > Unlike English, the French rules for capitalizing titles are much more
    > strict : there's no upparcasing of almost all words, but only the
    > first word, plus the next one if the first word is a definite article
    > (« Le, La, Les, L’ ») because it is not significant for collation (in
    > fact it is not written with a "majuscule", but just as a typographic
    > capital : French makes a clear distinction between capitals, which is
    > a typographic presentation, mandatory at the begining of sentences,
    > and majuscules which are orthographic and invariant in dictionnaries,
    > notably for proper names).
    > There are additional rules when a title is not a verbal sentence (i.e.
    > not a full sentence with at least a subject and conjugated verb) :
    According to
    "Si le titre . . . s’il consiste en une phrase conjuguĂ©e, seul le premier terme prend la majuscule :
    " Le train sifflera trois fois."
    "If the title . . . if it consists of a complete sentence only the first word is capitalized" (not the first noun term the first word, even if it's a definite article the noun after it is not capitalized):
    "Le train sifflera trois fois."
    "The train will whistle three times."
    Also, here's the same info. again on titles that read as sentences:

    "Attention, si le titre forme une phrase complĂšte (sujet et verbe), seul le premier mot prendra la capitale : 'La dialectique peut-elle casser des briques ?' Dans ce cas, pas de cap au premier substantif."
    "Careful, if the title forms a complete sentence (subject and verb), only the first word will begin with a capital letter:
    "'La dialectique peut-elle casser des briques?' ('The dialectic can it break bricks?') In this case, no cap on the first noun."
    The above rule apparently holds true for any headline or title that reads at all like a sentence as in:
    “Quatre Irakiens tuĂ©s dans des attaques, six corps dĂ©couverts”.
    ('Four Iraquis killed in attacks; six bodies found';
    this heading is not in caps except for the first word and proper nouns since it has a "phrase conjuguee," which I translated as a 'complete sentence' earlier though it's really any phrase with a conjugated verb.)

    > conjunctions like « et, ou ») : the conjonction is not capitalized, as
    > well as the possible article after it. E.g. « Le Corbeau et le Renard
    > », but these additional items are still capitalized individually.
    > * If the first word of the title is not a definite article, but any
    > other terminant, it is capitalized and does not force capitalizing
    > other words after it (with the exception of enumerations). E.g. «
    > Trois Hommes et un Coufin ».
    There is an error however in this regard at

    "Le Vieil Homme et la mer" ou "La Dolce Vita".
    The first title should read, "Le Vieil Homme et la Mer"; it's corrected later in the blog:
    "rédigé par : Anonyme | le 09 mai 2006 à 11:18 | . . .
    “Le Vieil Homme et la Mer?”
    "Roger-Max . . . a apporté sa réponse
    "'Le Vieil Homme , sa MĂšre, son Gisant et les six cognes.' Quelque chose dĂ©cogne
j’y retourne immĂ©diatement "
    'Corrected by: Anonymous: 9 May 2006 at 11:18 . . .
    'The Old Man and the Sea'
    'Roger-Max . . . has his response:
     ''The Old Man, His Mother [which sounds like the word 'Sea' in French], Their Effigy, and the six cops?' Something definitely-raps [? I'm not sure about how to translate, "decogner;" maybe it's "deconner;" maybe someone else knows this term; I had to guess its meaning from its parts], I'll be back right away.'
    Why though is "et les six cognes," 'and the six cops,' in lower case here? (Not an important question. And sorry for the delay in writing back on this.)
    --C. E. Whitehead
    > * if the second word after the definite article is not a noun but an
    > adjective, the capitalization is reported to the first noun after it.
    > E.g. « Le Joli Mois de mai » (outside of a title, it reads
    > orthographically as « le joli mois de mai » without any capital,
    > there's no majuscule in both cases), or « Les Trois Mousquetaires ».
    > The special exception for definite articles is very limited to only
    > these three words « Le, La, Les » and the elided form « L’ » or «
    > La/Les », as they are extremely frequent in titles ; the special rule
    > for enumerations comes from the fact that the order is often not
    > significant, or because multiple entries may be inserted in indices
    > for each item in the enumeration).
    > They are important because they help correct sorting of titles in
    > collections (notably for finding books in public or commercial
    > libraries, or in collection indices), or music CD or films in shops
    > (if there's no significant author name).
    > So this should really be: « L’Histoire du QuĂ©bec », with a capital H,
    > if this is a artistic production title (book name, song title, movie
    > title). The rules are wellknown and very respected in French (you can
    > find these rules documented in most French typographic guides, as well
    > as in French Wikipedia, French Wikibooks, where they are also used as
    > a convention strongly applied).
    > These French rules capitalize much less words than English in titles
    > (but still retain all initial capitals on proper names).
    > The Unicode "titlecase" algorithm clearly does not work at all for
    > French and should NEVER be used there, as it was only designed for
    > English in mind. My opinion is that this algorithm should be
    > deprecated from the standard, and only given as informative for a
    > limited set of languages, and for just a few contexts (but all your
    > discussion previously on this list shows that the subject is extremely
    > fuzzy, even in English, as it even breaks on various English proper
    > names : better not use it).
    Mark suggested I think that at leat perhaps the first word in a sentence or phrase might be capitalized. I tend to think there is no harm in this since this is done automatically by text processors anyway, though yes it's true that title case is only relevant for certain scripts (Latin really; Cyrillic to some degree; not sure about others . . . it's not relevant for the Arabic script)
    > Philippe.



    This archive was generated by hypermail 2.1.5 : Sun Mar 06 2011 - 12:24:13 CST