RE: Titlecasing words starting with numeric glyphs and period as word separator

From: CE Whitehead (
Date: Sun Mar 06 2011 - 21:06:25 CST

  • Next message: Tiago Estill de Noronha: "Assigning a plane for mapping digits for many different bases"

    Hi, there were several errors in the last example in my last post (not typos exactly however for me, but a bit more major -- as I got near the end of my translation I did not pay enough attention to being exact; sorry; I did not intend to mess up here):
    "j'y retourne immediatement" in the post on titles with mutiple terms conjoined by "et" or "ou" ('and' or 'or') should have been translated, "I'll get back on this right away"
    (sorry that I did not translate the French adverbial pronoun "y" ['there'] initially;
    also, in any case the blogger has not gotten back on this.)
    Also "son Gisant" should have been translated as either "his Effigy" or "her Effigy"
    but there is no way to tell which translation is correct from the context provided.
    The corrections are provided again below.
    (If you have not read my last post, everything in it is here and corrected.)

    Subject: RE: Titlecasing words starting with numeric glyphs and period as word separator
    Date: Sun, 6 Mar 2011 13:18:46 -0500

    > Hi, I did finally find a few online sources on French titles (but now tend to agree that just
    > capitalizing the first word for Latin and Cyrillic is the best solution for now; so the information
    > may not be that helpful; but it's here for anyone interested; it's easier I think to get the rules for
    > English titles and perhaps these could be fixed in the CLDR database -- if you can fix a rule for
    > recognizing a title; I think it's easier to fix a rule for recognizing a first word in a sentence or
    > title . . . which Mark said might need discussion ):

    > 1.

    > According to wordreference's forum discussion:
    > "The answer is fairly simple: don't capitalize words following a colon, semicolon or comma.
    > "Les Ordinateurs et l'Humanité: une guerre pour Sion ?"

    > On the other hand as Phillipe noted words joined by "et" or "ou" ('and' or 'or') are capitalized.
    > 2.

    > ( A post here points out that rules were more like English before the 19th century)

    > The above site has information about the differences in capitalizing titles that are complete
    > sentences or have a conjugated verb phrase and capitals that are not.

    > 3.


    > This site provides information about the use of capital letters in words that refer to places, such
    > as "Ouest" or "Occident" ('West'), "Orient" ('East', 'Orient'); but "Sud-Est asiatique" ('Southeast
    > Asia' because in French the reference to Asia is not a noun but an adjective; literally the phrase
    > reads, '[the] Asiatic Southeast'); but "l'ouest de la France" ('the west of France'), "lac" ('lake'),
    > etc., and surprisingly, "l'ocean Pacifique" (why 'Pacific' the adjective here gets capitalized but
    > not the noun 'ocean'? 'Pacific' is the word that distinguishes the place while 'ocean' is just a
    > common noun, but not 'Ile-', 'Isle-' in the hyphenated phrase "Ile-de-France").
    > And of course as Phillipe pointed out in an off-list email, you capitalize "Les Francais" and "Les
    > Belges," but not the word "francais" in "les Belges français."
    { "francais" is of course an adjective; "Belges" is the noun }
    > So getting French place names right may be out of scope for unicode/CLDR.

    >> Date: Thu, 3 Mar 2011 02:01:41 +0100
    >> Subject: Re: Titlecasing words starting with numeric glyphs and period as word separator
    >> From:
    >> To:
    >> CC:;;;
    >> 2011/3/2 Mark Davis ☕ <>:
    >> > I have a typo in the following. Should have written:
    >> > l’histoire du Québec => L’histoire du Québec
    >> Unlike English, the French rules for capitalizing titles are much more
    >> strict : there's no upparcasing of almost all words, but only the
    >> first word, plus the next one if the first word is a definite article
    >> (« Le, La, Les, L’ ») because it is not significant for collation (in
    >> fact it is not written with a "majuscule", but just as a typographic
    >> capital : French makes a clear distinction between capitals, which is
    >> a typographic presentation, mandatory at the begining of sentences,
    >> and majuscules which are orthographic and invariant in dictionnaries,
    >> notably for proper names).
    >> There are additional rules when a title is not a verbal sentence (i.e.
    >> not a full sentence with at least a subject and conjugated verb) :
    > According to
    > "Si le titre . . . s’il consiste en une phrase conjuguée, seul le premier terme prend la
    > majuscule :
    > " Le train sifflera trois fois."
    > "If the title . . . if it consists of a complete sentence only the first word is capitalized" (not the
    > first noun term the first word, even if it's a definite article the noun after it is not capitalized):
    > "Le train sifflera trois fois."
    > "The train will whistle three times."
    > Also, here's the same info. again on titles that read as sentences:
    > "Attention, si le titre forme une phrase complète (sujet et verbe), seul le premier mot prendra la > capitale : 'La dialectique peut-elle casser des briques ?' Dans ce cas, pas de cap au premier
    > substantif."
    > "Careful, if the title forms a complete sentence (subject and verb), only the first word will begin > with a capital letter:
    > "'La dialectique peut-elle casser des briques?' ('The dialectic can it break bricks?') In this case, > no cap on the first noun."
    > The above rule apparently holds true for any headline or title that reads at all like a sentence as > in:
    > “Quatre Irakiens tués dans des attaques, six corps découverts”.
    > ('Four Iraquis killed in attacks; six bodies found';
    > this heading is not in caps except for the first word and proper nouns since it has a "phrase
    > conjuguee," which I translated as a 'complete sentence' earlier though it's really any phrase
    > with a conjugated verb.)
    >> conjunctions like « et, ou ») : the conjonction is not capitalized, as
    >> well as the possible article after it. E.g. « Le Corbeau et le Renard
    >> », but these additional items are still capitalized individually.
    >> * If the first word of the title is not a definite article, but any
    >> other terminant, it is capitalized and does not force capitalizing
    >> other words after it (with the exception of enumerations). E.g. «
    >> Trois Hommes et un Coufin ».
    > There is an error however in this regard at

    > "Le Vieil Homme et la mer" ou "La Dolce Vita".
    > The first title should read, "Le Vieil Homme et la Mer"; it's corrected later in the blog:
    > "rédigé par : Anonyme | le 09 mai 2006 à 11:18 | . . .
    > “Le Vieil Homme et la Mer?”
    > "Roger-Max . . . a apporté sa réponse
    { The above does not go with the following; I copied off bits and pieces of this blog into notepad and then worked from that and not from the original; sorry }
    > "Alerter
    > "'Le Vieil Homme , sa Mère, son Gisant et les six cognes.' Quelque chose décogne…j’y retourne > immédiatement "
    > 'Corrected by: Anonymous: 9 May 2006 at 11:18 . . .
    > 'The Old Man and the Sea'
    > 'Roger-Max . . . has
    {I should have inserted "brought" here: 'has brought his response;'
    or 'has provided his response'}
    his response:
    { it's now another blogger who says the following: }
    > ''The Old Man, His Mother [which sounds like the word 'Sea' in French],
    > Their Effigy, and the six > cops?' Something definitely-raps [? I'm not sure about how to
    { "son Gisant" should have been translated "his likeness" or "her effigy" but as I did not know to which person the "Gisant" belongs so I translated it "their" }
    > translate, "decogner;" maybe it's > "deconner;" maybe someone else knows this term; I had
    > to guess its meaning from its parts],
    > I'll be back right away.'
     { "j'y retourne immediatement" should have been translated, "I'll get back on this right away;" sorry I did not translate the "y" before. }
    > Why though is "et les six cognes," 'and the six cops,' in lower case here? (Not an important
    > question. And sorry for the delay in writing back on this.)

    { Does "et les six cognes" have the same importance as the other terms? Again not an important question. }
    > Best,
    > --C. E. Whitehead
    >> * if the second word after the definite article is not a noun but an
    >> adjective, the capitalization is reported to the first noun after it.
    >> E.g. « Le Joli Mois de mai » (outside of a title, it reads
    >> orthographically as « le joli mois de mai » without any capital,
    >> there's no majuscule in both cases), or « Les Trois Mousquetaires ».
    >> The special exception for definite articles is very limited to only
    >> these three words « Le, La, Les » and the elided form « L’ » or «
    >> La/Les », as they are extremely frequent in titles ; the special rule
    >> for enumerations comes from the fact that the order is often not
    >> significant, or because multiple entries may be inserted in indices
    >> for each item in the enumeration).
    >> They are important because they help correct sorting of titles in
    >> collections (notably for finding books in public or commercial
    >> libraries, or in collection indices), or music CD or films in shops
    >> (if there's no significant author name).
    >> So this should really be: « L’Histoire du Québec », with a capital H,
    >> if this is a artistic production title (book name, song title, movie
    >> title). The rules are wellknown and very respected in French (you can
    >> find these rules documented in most French typographic guides, as well
    >> as in French Wikipedia, French Wikibooks, where they are also used as
    >> a convention strongly applied).
    >> These French rules capitalize much less words than English in titles
    >> (but still retain all initial capitals on proper names).
    >> The Unicode "titlecase" algorithm clearly does not work at all for
    >> French and should NEVER be used there, as it was only designed for
    >> English in mind. My opinion is that this algorithm should be
    >> deprecated from the standard, and only given as informative for a
    >> limited set of languages, and for just a few contexts (but all your
    >> discussion previously on this list shows that the subject is extremely
    >> fuzzy, even in English, as it even breaks on various English proper
    >> names : better not use it).
    > Mark suggested I think that at leat perhaps the first word in a sentence or phrase might be
    > capitalized. I tend to think there is no harm in this since this is done automatically by text
    > processors anyway, though yes it's true that title case is only relevant for certain scripts (Latin
    > really; Cyrillic to some degree; not sure about others . . . it's not relevant for the Arabic script)
    >> Philippe.



    This archive was generated by hypermail 2.1.5 : Sun Mar 06 2011 - 21:09:34 CST