Re: TR35

From: Mark Davis (
Date: Thu May 13 2004 - 15:41:49 CDT

  • Next message: Kenneth Whistler: "Re: Multiple Directions (was: Re: Coptic/Greek (Re: Phoenician))"

    You speak as if date or number formats had nothing to do with language. I very
    much disagree. If I have message that says: "The date of the last version of
    this document was 2003年3月20日", nobody in their right mind would say that that is
    correct English. (More on that at the end of,
    as I pointed to).

    The core of what anyone means by locale is the language -- and that means, in
    our context, written language, thus including script (Cryl vs Latn) and variants
    (such as US vs UK spelling). The choice of language affects most of what people
    traditionally associate with software globalization, including date, time,
    number, currency, formatting & parsing; segmentation (words, lines); collation
    and searching; resource bundle choice for translated text & appropriate icons,

    So if that is all of what someone means by locale, then there is little point in
    distinguishing between "locale IDs" and "language IDs".

    There are attributes that are clearly orthogonal to language, like choice of
    timezone or choice of currency (not the *formatting* of them, but the *choice*).
    So if one's locale definition includes something like: language=sh-Cryl-YU plus
    currency=EUR plus timezone=GMT, then that is clearly something far different
    than just language.

    If that is what someone means by locale, then there one must clearly distinguish
    between "locale IDs" and "language IDs". Syntactically, locale IDs may be an
    extension of language IDs, since they do form the core. Or one could use some
    completely different structure. In CLDR, for example, we use RFC 3066 for the
    language part (actually an extension, anticipating RFC 3066bis), but then use an
    extension mechanism for additional features that are not captured by language.

    ► शिष्यादिच्छेत्पराजयम् ◄

    ----- Original Message -----
    From: "Peter Constable" <>
    To: "Unicode Mailing List" <>
    Sent: Thu, 2004 May 13 11:58
    Subject: RE: TR35

    > > > Moreover, you would never label a document for a
    > > > number format in order to determine how automated-formatting
    > > > of numbers should be done on the receiving system.
    > >
    > > You would not label it to determine formatting on the receiving
    > system, but
    > > to determine interpretation (parsing) of formatted values in the
    > received
    > > data. You need to know what the convention is to interpret the number
    > > 123.456 or the date 02/03/04.
    > But as I pointed out earlier, you cannot know for certain how to
    > interpret it unless you know how it was generated; and if it was entered
    > manually by a human, you need to know what they were thinking. A locale
    > ID cannot tell you that. A locale ID is useful only if the string that's
    > received was generated automatically on the originating system (and you
    > know that to be the case), but I'm guessing that most of the time when
    > that actually happens, that string is going to be an isolated element
    > within a data structure.
    > It is the case that in a significant number of situations the language
    > tag of content will include a region ID, and if I encounter a formatted
    > number or date string in the content, I can use that to guess what the
    > correct interpretation should be. But I'm not sure I'd want to build a
    > system for processing business transactions on such assumptions.
    > Peter
    > Peter Constable
    > Globalization Infrastructure and Font Technologies
    > Microsoft Windows Division

    This archive was generated by hypermail 2.1.5 : Thu May 13 2004 - 15:42:21 CDT