Re: French group separators

From: Philippe Verdy (
Date: Mon Jul 07 2003 - 09:47:16 EDT

  • Next message: SADAHIRO Tomoyuki: "Re: The character for 10**24 in Japanese numbers (jo)"

    On Monday, July 07, 2003 2:04 PM, Peter Kirk <> wrote:

    > On 07/07/2003 04:15, Philippe Verdy wrote:
    > > The list separator in French is preferably the semicolon, rather
    > > than a comma (which must then have a space):
    > > => "123<thin space>;<standard space>456"
    > > The <thin space> is here also encoded accroding to the character
    > > encoding
    > > constraints and fonts (here also less wide than a digit,
    > > unbreakable and
    > > not justified).
    > Earlier he wrote:
    > > In strict historic English typography, the unbreakable whitespaces
    > > before punctuations are often smaller (sixth of cadratin) and
    > > that's why they are often missed in ASCII-only text.
    > I wonder if here we are confusing character encoding with adjustments
    > which should be made during rendering and typesetting - and which
    > perhaps in the days of hot metal were made by including thin spacers.
    > Are you really suggesting that the huge quantities of text in English,
    > French and other languages, in ASCII and Unicode, are actually
    > wrongly
    > encoded, because there is almost invariably no character code for a
    > thin
    > space before punctuation? Surely it would be much more sensible to
    > accept that this text is correctly encoded, and leave it to the text
    > rendering or typesetting process to adjust the position of punctuation
    > marks as appropriate.

    No I did not suggest such things. In fact I just wrote the opposite, byjust saying that there are a lot of variation in the actual space character used in strict typographic typesetting for punctuations.

    Correctly encoded French text means nothing face to Unicode standardization: this is not a Unicode issue but an internationalization and localization issue,as well as a rendering decision from the document author.

    Whatever space is used, given several other constraints which may limit the choice of spaces to use, this does not change the cultural convention used in French to use a space rather than a dot as a thousands grouping separator.

    The situation is less clear however for phone numbers: some use thin unbreakable spaces equivalently to dots, and phone numbers are generally grouped by units of 2 digits (for the standard 10-digits national format), or 3 digits (for special numbers, if they are just easier to remember, like toll free numbers "0 800 xxx xxx") or no separator at all for national short numbers
    with 3 or 4 digits like "112" (the European emergency phone number, toll-free on wired lines and mobile phones). For phone numbers in France, we never use any hyphen.

    As I said, I described the *ideal* encoding and rendering of the group separator, not any single encoding (which is used and chosen by each author). With all respects to what Tex said, the usage of dots as a thousands group separator is never used by actual French writers, and you'll find it only in softwares incorrectly localized to French.

    The default Windows setting for this grouping character is the non-breaking space U+00A0 found in the default Windows codepage 1252 used by Western European localization of Windows. And few users feel the need to change it in the user's regional settings.

    In Linux/Unix translation projects for documents, this NBSP is the de-facto prefered encoding agreed by the translation community, and its translation standard requires using this NBSP before any two-glyphs ending or closing punctuation sign (colon, semi-colon, exclamation point, interrogation point, closing double angle guillemot), and after any two-glyph opening punctuation sign (opening double angle guillemot). So I do think that NBSP is the best interoperable encoding for a source text, but this means nothing for the actual typesetting of the documents, which may implicitly replace NBSP occurences (in NBSP+punctuation or digit+NBSP+digit) by a less wide unbreakable and non-justified space (but certainly not by a dot).

    Some other conventions use in English is the double-space after a sentence-ending dot: this convention does not exist in French, and I do think that it exist in English as a way to represent a large (cadratin minimum width) space after this dot. In French the minimum width for this space is just a half-cadratin (so it matches the standard space), and this space can be word-justified (made wider) or removed when lines wrap on the right margin.

    This is unlike thin-spaces used for digit grouping, or for linking a punctuation sign to the nearby character, which are unbreakable, and not word-justified; but they still allow to be enlarged if a word justification creates too wide spaces, and intercharacter spacing or narrowing must be applied to create a more uniform colored text, notaby for texts presented in standard narrow columns (such as newspapers, classified ads, or phonebook white pages, where each line is roughly 53 characters or signs on average).

    This archive was generated by hypermail 2.1.5 : Mon Jul 07 2003 - 11:33:12 EDT