Formatting Durations

The following is a strawman proposal for additions to CLDR 1.5 to support durations. The goal is to have a mechanism that allows for reasonable formatting of common durations, with a format which is as easy as possible for translators to use (with instruction).

Formats

  1. Add a set of flexible formats targeted at durations.
    <durationFormatItem id="hhmm">hh:mm:ss</dateFormatItem>
    <durationFormatItem id="hhmmss">hh:mm</dateFormatItem>
    <durationFormatItem id="hhhmmm">hhh mmm</dateFormatItem>
    <durationFormatItem id="wwwddd">ddd www</dateFormatItem>
  2. The one and two letter fields have their normal semantics, except that the numeric width of the top field is unbounded. Eg 108 hours and 23.5 minutes would be "108:23:30" with the first of the above formats. The difference between hh,HH,kk,KK is ignored: all hour fields are 0..∞.
  3. Field values of 3 or above are handled by formatting as special field fields (see below), then substituting. 3 letters (eg hhh) gets the abbreviated format, four letters (like hhhh) gets the wide format.
  4. There is an additional concatenation format for fallback, eg "{0}, {1}". If there is no exact match, the longest initial match in big-endian order is used, and the results concatenated with this format. Eg, suppose the key is "dms", and there is no match. Then we try for "dm", then "d", and concatenate that result with the result of formatting the rest, using "{0}, {1}". So we might get "1 day, 3 minutes 17 seconds".

Special Field Values

  1. We add structure to CLDR for each field type, something like the following.
    <durationLength type="wide">
      <duration type="h" number="singular">1 Stunde</duration>
      <duration type="h" number="other">{0} Stunden</duration>
      <duration type="m" number="singular">1 Minute</duration>
      ...
    </durationLength>
    <durationLength type="abbreviated">
      <duration type="h" number="other">{0}s</duration>
      <duration type="m" number="singular">1m</duration>
      ...
  2. In these fields, {0} is a placeholder that uses the default number format for that locale.
  3. The number attribute keywords are defined to be the following, initially. (We would add more attributes as we find languages that need them.) So for Russian, what corresponds to the above list would be contain oneMod and fewMod. The other keyword is matched if none of the available others match.
    keyword tests condition comment
    zero x == 0  
    one x == 1  
    two x == 2 used in Slovenian
    some x == 3 || x == 4 used in Slovenian
    oneMod x == 1 || x > 20 && (x mod 10) == 1 used in Russian, Serbian,...
    fewMod 2 <= x && x <= 4 || x > 20 && 2 <= (x mod 10) && (x mod 10) <= 4 used in Russian, Serbian,...
    other x == anything only matches if no other conditions true
  4. Issue: should we allow multiple attributes for single element, like <duration type="h" number="one some">{0}zw</duration>. At this point I don't think it is necessary.

Expected API

The expected API would have certain parameters.

  1. It would allow the programmer to pass in a key (eg "hm" or "hhhmmm") as in flexible formats. (While it is possible to pass in fields of mixed lengths, we caution the programmer that it is unlikely that good results will obtain.)
  2. It would allow the least field to have fractions. The programmer could pass in just the min/max fractional digits (or maybe for generality a number format (for that locale of course)). Thus 108 hours 23.5 minutes would be "108:23.5" with the second of the above formats.
  3. It would allow leading, trailing, and/or interior zero fields (in any combination) to be suppressed. Suppose for example that the key is "dhms", and the actual value turns out to be 0 days 3 hours 0 minutes 5 seconds. Then here are some results:The actual key value that is looked up will change if suppression is chosen. So if the key is dhms and the h value is zero, then "dms" is actually looked up in the flexible duration list.