Re: Globalized lists

From: Mark Davis (mark.davis@icu-project.org)
Date: Tue Dec 13 2005 - 13:28:38 CST

  • Next message: Mike Ayers: "Re: Globalized lists"

    About the lists (and not the terminology), what we do in ICU for the few
    times when we need something like this is to use multiple message
    formats, each of which can be localized. They are then used iteratively
    for a list.

     Here is an example of how this can be done (using Java syntax -- in the
    latest version of Java you can also avoid the ugly new Object[]{...}
    construct).

    // all but emptyList are MessageFormats
    // all are to be localized for the specific language, eg lastPair in
    English uses a string resource "{0}, and {1}"

    switch(listItems.size()) {
    case 0: return emptyList;
    case 1: return singletonList.format(listItems.toArray());
    case 2: return doubletonList.format(listItems.toArray());
    default:
        String intermediateResult = linking.format(new
    Object[]{listItems.get(0), listItems.get(1)});
        for (int i = 2; i < listItems.size()-1; ++i) {
            intermediateResult = linking.format(new
    Object[]{intermediateResult, listItems.get(i)});
        }
        return lastPair.format(new Object[]{intermediateResult,
    listItems.get(listItems.size()-1)});
    }

    and here is an example of English output:

    0 items: (none)
    1 items: a
    2 items: a and b
    3 items: a, b, and c
    4 items: a, b, c, and d
    5 items: a, b, c, d, and e

    We haven't formalized this in ICU APIs, however, nor does CLDR currently
    have provision for it. Although we haven't gotten any complaints about
    it, it is certainly possible that there are languages for which this
    structure is insufficient for customary usage; one might, for example,
    need to treat the first pair in the default case (eg 3+) specially, or
    have a tripleton handled specially.

    Mark

    Addison Phillips wrote:

    >Hi Mike,
    >
    >The notes on the list so far haven't really dealt with how one approaches coding the formatting of an array of values into a sentence, only with the grammatical problems inherent in generating the sentence fragment provided. Most approaches to this combine internationalization (writing code) with localization (tailoring the presentation to each specific language) through the use of resources.
    >
    >This is frequently done by using a pattern string coupled with dynamic replacement with the values selected.
    >
    >An example of this would be java.text.MessageFormat in Java (most modern operating environments have similar classes, functions, or APIs). The pattern string in this case would be "{0}, {1}, {2}, and {3}" in English. Other languages would use a different pattern string with the proper punctuation, spacing, wording, and the like (loaded from externalized resources of course). For detailed documentation, see: http://java.sun.com/j2se/1.5.0/docs/api/java/text/MessageFormat.html
    >
    >If the list can have different numbers of items, such an approach won't work by itself. One way to approach that is to have different pattern strings for different numbers of items. The Java example for this would be java.text.ChoiceFormat (there is a link on the above page)
    >
    >Even this approach has limits, since it requires one to have as many choice strings as there are options for arranging the list. Each language, as pointed out previously, will have its own idiosyncrasies (handling items with different counts, case, and so on may require more pattern strings or careful design of both text and the items inserted into the text). You may also need to consider some mixed-language or mixed-script problems (as when some items are in one script and some in another).
    >
    >Without knowing more specifics, it is difficult to advise you precisely. If you are trying to write general purpose code that can serve many languages, perhaps simultaneously, then avoiding the generation of long lists where possible might be a good idea. The more one fools around with count, gender, inter-word dependency and the like, the more likely one is to get it wrong somehow.
    >
    >With regard to definitions of internationalization and globalization, I note that the W3C has just posted a definition for the former: http://www.w3.org/International/questions/qa-i18n
    >
    >This version is pretty good, although the creation of it was fraught with argument. I'm pretty happy with it, since there are different definitions in use and everyone has their little spin on it. I'm personally fond of: "Internationalization is a fundamental architectural approach to enabling software to handle variations in culture, language, or geography."
    >
    >"Globalization" has the (somewhat negative) meaning that others on this list have ascribed to it in general usage. However, in the software world it has acquired a different meaning or set of meanings. Personally, I tend to favor something similar to IBM's definition:
    >
    >---
    >The process of developing, manufacturing, and marketing software products that are intended for worldwide distribution. This term combines two aspects of the work: internationalization (enabling the product to be used without language or culture barriers) and localization (translating and enabling the product for a specific locale).
    >---[ftp://ftp.software.ibm.com/software/globalization/documents/whyglobalization.pdf]
    >
    >Hope that's a good starting point.
    >
    >Addison
    >
    >Addison P. Phillips
    >Globalization Architect, Quest Software
    >
    >Internationalization is not a feature.
    >It is an architecture.
    >
    >
    >
    >>Date: Mon, 12 Dec 2005 17:14:16 -0800
    >>From: Mike Ayers <mayers@celequest.com>
    >>Subject: Globalized lists
    >>
    >>
    >> Given a dynamic list to enumerate into a sentence ( "dog", "cat",
    >>"mouse", "ostrich" ), how do I proceed? The list items themselves are
    >>already globalized, it is the creation of the syntactically correct
    >>localized string declaring the list ( "dog, cat, mouse, and ostrich" ).
    >> Pointers are fine.
    >>
    >>
    >> Thanks,
    >>
    >>/|/|ike
    >>
    >>
    >>P.S. Point me to some definitions, too, please. I find myself using
    >>"globalization" and "internationalization" interchangeably, which I'm
    >>rather sure is incorrect. I tried to find a trade dictionary online
    >>with no success.
    >>
    >>
    >>
    >
    >
    >
    >
    >
    >
    >



    This archive was generated by hypermail 2.1.5 : Tue Dec 13 2005 - 13:43:44 CST