Technical Reports |

Part 3: Numbers

Version | 23 |

Editors | Mark Davis (markdavis@google.com) and other CLDR committee members |

Date | 2013-03-15 |

This Version | http://www.unicode.org/reports/tr35/tr35-31/tr35.html |

Previous Version | http://www.unicode.org/reports/tr35/tr35-29.html |

Latest Version | http://www.unicode.org/reports/tr35/ |

Corrigenda | http://unicode.org/cldr/corrigenda.html |

Latest Proposed Update | http://www.unicode.org/reports/tr35/proposed.html |

Namespace | http://cldr.unicode.org/ |

DTDs | http://unicode.org/cldr/dtd/23/ |

Revision | 31 |

This document describes parts of an XML format (*vocabulary*)
for the exchange of structured locale data. This format is used in the
Unicode Common Locale Data Repository.

This is a partial document, describing only those parts of the LDML that are relevant for number and currency formatting. For the other parts of the LDML see the main LDML document and the links above.

*This document has been reviewed by Unicode members and other
interested parties, and has been approved for publication by the Unicode
Consortium. This is a stable document and may be used as reference
material or cited as a normative reference by other specifications.*

A Unicode Technical Standard (UTS)is an independent specification. Conformance to the Unicode Standard does not imply conformance to any UTS.

*Please submit corrigenda and other comments with the CLDR bug reporting form [Bugs]. Related information that is useful in understanding
this document is found in the References. For the latest version of the Unicode Standard see [Unicode]. For
a list of current Unicode Technical Reports see [Reports]. For more information about versions of the Unicode Standard, see [Versions].
*

- 1 Numbering Systems
- 2 Number Elements
- 2.1 Default Numbering System
- 2.2 Other Numbering Systems
- 2.3 Number Symbols
- 2.4 Number Formats

- 3 Number Format Patterns
- 3.1 Number Patterns

- 4 Currencies
- 5 Language Plural Rules
- 6 Rule-Based Number Formatting
- 7 Parsing Numbers

<!ELEMENT numberingSystems ( numberingSystem* ) >

<!ELEMENT numberingSystem EMPTY >

<!ATTLIST numberingSystem id NMTOKEN #REQUIRED >

<!ATTLIST numberingSystem type ( numeric | algorithmic ) #REQUIRED >

<!ATTLIST numberingSystem radix NMTOKEN #IMPLIED >

<!ATTLIST numberingSystem digits CDATA #IMPLIED >

<!ATTLIST numberingSystem rules CDATA #IMPLIED >

Numbering systems information is used to define different
representations for numeric values to an end user. Numbering systems are
defined in CLDR as one of two different types: algorithmic and numeric.
Numeric systems are simply a decimal based system that uses a predefined set
of digits to represent numbers. Examples are Western ( ASCII digits ), Thai
digits, Devanagari digits. Algorithmic systems are more complex in nature,
since the proper formatting and presentation of a numeric quantity is based
on some algorithm or set of rules. Examples are Chinese numerals, Hebrew
numerals, or Roman numerals. In CLDR, the rules for presentation of numbers
in an algorithmic system are defined using the RBNF syntax described in
*Section 6: Rule-Based Number Formatting*.

Attributes for the <numberingSystem> element are as follows:

id - Specifies the name of the numbering system that can be used to designate its use in formatting.

type - Specifies whether the numbering system is algorithmic or numeric.

digits - For numeric systems, specifies the digits used to represent numbers, in order, starting from zero.

rules - Specifies the RBNF ruleset to be used for formatting numbers from this numbering system. The rules specifier can contain simply a ruleset name, in which case the ruleset is assumed to be found in the rule set grouping "NumberingSystemRules". Alternatively, the specifier can denote a specific locale, ruleset grouping, and ruleset name, separated by slashes.

Examples:

<numberingSystem id="latn" type="numeric" digits="0123456789"/> <!-- ASCII digits - A numeric system -->

<numberingSystem id="thai" type="numeric" digits="๐๑๒๓๔๕๖๗๘๙"/> <!-- A numeric system using Thai digits -->

<numberingSystem id="geor" type="algorithmic" rules="georgian"/> <!-- An algorithmic system - Georgian numerals , rules found in NumberingSystemRules -->

<numberingSystem id="hant" type="algorithmic" rules="zh_Hant/SpelloutRules/spellout-cardinal"/> <!-- An algorithmic system. Traditional Chinese Numerals -->For general information about the numbering system data, including the BCP47 identifiers, see the main document

<!ELEMENT numbers (alias | (defaultNumberingSystem*, otherNumberingSystems*, symbols*, decimalFormats*, scientificFormats*, percentFormats*, currencyFormats*, currencies?, special*)) >

The numbers element supplies information for formatting and parsing numbers and currencies. It has the following sub-elements:
<defaultNumberingSystem>, <otherNumberingSystems>, <symbols>, <decimalFormats>,
<scientificFormats>, <percentFormats>, <currencyFormats>, and <currencies>. The currency IDs are from [ISO4217] (plus some additional
common-use codes). For more information, including the pattern structure, see *Section 3: Number Format Patterns*.

<!ELEMENT defaultNumberingSystem ( #PCDATA )

This element indicates which numbering system should be used for presentation of numeric quantities in the given locale.

<!ELEMENT otherNumberingSystems ( alias | ( native*, traditional*, finance*)) >

This element defines general categories of numbering systems that are sometimes used in the given locale for formatting numeric quantities. These additional numbering systems are often used in very specific contexts, such as in calendars or for financial purposes. There are currently three defined categories, as follows:

**native**- Defines the numbering system used for the native digits, usually defined as a part of the script used to write the language. The native numbering system can only be a numeric positional decimal-digit numbering system, using digits with General_Category=Decimal_Number.
**traditional**- Defines the traditional numerals for a locale. This numbering system may be numeric or algorithmic. If the traditional numbering system is not defined, applications should use the native numbering system as a fallback.
**finance**- Defines the numbering system used for financial quantities. This numbering system may be numeric or algorithmic. This is often used for ideographic languages such as Chinese, where it would be easy to alter an amount represented in the default numbering system simply by adding additional strokes. If the financial numbering system is not specified, applications should use the default numbering system as a fallback.

The categories defined for other numbering systems can be used in a Unicode locale identifier to select the proper numbering system without having to know the specific numbering system by name. For example:

- To select Hindi language using the native digits for numeric formatting, use locale ID: "hi-IN-u-nu-native".
- To select Chinese language using the appropriate financial numerals, use locale ID: "zh-u-nu-finance".
- To select Tamil language using the traditional Tamil numerals, use locale ID: "ta-u-nu-traditio".

For more information on numbering systems and their definitions, see *Section 1: Numbering Systems*.

<!ELEMENT symbols (alias | (decimal*, group*, list*, percentSign*, nativeZeroDigit*, patternDigit*, plusSign*, minusSign*, exponential*, perMille*, infinity*, nan*, currencyDecimal*, currencyGroup*, special*)) >

Number symbols define the localized symbols that are commonly used when formatting numbers in a given locale. These symbols
can be referenced using a number formatting pattern as defined in *Section 3: Number Format Patterns*.

The available number symbols are as follows:

**decimal**- - separates the integer and fractional part of the number.
**group**- - separates clusters of integer digits to make large numbers more legible; commonly used for thousands
(grouping size 3, e.g. "100,000,000") or in some locales, ten-thousands (grouping size 4, e.g. "1,0000,0000"). There may be two different
grouping sizes: The
*primary grouping size*used for the least significant integer group, and the*secondary grouping size*used for more significant groups; these are not the same in all locales (e.g. "12,34,56,789"). If a pattern contains multiple grouping separators, the interval between the last one and the end of the integer defines the primary grouping size, and the interval between the last two defines the secondary grouping size. All others are ignored, so "#,##,###,####" == "###,###,####" == "##,#,###,####". **list**- - separates lists of numbers
**percentSign**- - symbol used to indicate a percentage (1/100th) amount. (If present, the value is also multiplied by 100 before formatting. That way 1.23 → 123%)
**nativeZeroDigit**- - Deprecated - do not use.
**patternDigit**- - Symbol used to indicate any digit value, typically #. When that digit is zero, then it is not shown.
**minusSign**- - Symbol used to denote negative value.
**plusSign**- - Symbol used to denote positive value.
**exponential**- - Symbol separating the mantissa and exponent values.
**perMille**- - symbol used to indicate a per-mille (1/1000th) amount. (If present, the value is also multiplied by 1000 before formatting. That way 1.23 → 1230 [1/000])
**infinity**- - The infinity sign. Corresponds to the IEEE infinity bit pattern.
**nan - Not a number**- - The NaN sign. Corresponds to the IEEE NaN bit pattern.
**currencyDecimal**- This is used as the decimal separator in currency formatting/parsing, instead of the regular decimal separator. This item is optional in the CLDR.
**currencyGroup**- This is used as the grouping separator in currency formatting/parsing, instead of the regular grouping separator. This item is optional in the CLDR.

Example:

</symbols> <decimal>.</decimal> <group>,</group> <list>;</list> <percentSign>%</percentSign> <patternDigit>#</patternDigit> <plusSign>+</plusSign> <minusSign>-</minusSign> <exponential>E</exponential> <perMille>‰</perMille> <infinity>∞</infinity> <nan>☹</nan> </symbols>

<!ATTLIST symbols numberSystem CDATA #IMPLIED >

The numberSystem attribute is used to specify that the given number symbols
are to be used when the given numbering system is active. Number symbols can only be defined for
numbering systems of the "numeric" type, since any special symbols required for an algorithmic numbering system
should be specified by the RBNF formatting rules used for that numbering system. By default, number
symbols without a specific numberSystem attribute are assumed to be used for the "latn"
numbering system, which is western (ASCII) digits. Locales that specify a numbering system other
than "latn" as the default should also specify number formatting symbols that are appropriate
for use within the context of the given numbering system. For example, a locale that uses the Arabic-Indic
digits as its default would likely use an Arabic comma for the grouping separator rather than the ASCII comma.

For more information on numbering systems and their definitions, see *Section 1: Numbering Systems*.

<!ELEMENT decimalFormats (alias | (default*, decimalFormatLength*, special*))>

<!ELEMENT decimalFormatLength (alias | (default*, decimalFormat*, special*))>

<!ATTLIST decimalFormatLength type ( full | long | medium | short ) #IMPLIED >

<!ELEMENT decimalFormat (alias | (pattern*, special*)) >

(scientificFormats, percentFormats have the same structure)

Number formats are used to define the rules for formatting numeric quantities using the pattern syntax described in
*Section 3: Number Format Patterns*.

Different formats are provided for different contexts, as follows:

**decimalFormats**- The normal locale specific way to write a base 10 number. Variations of the decimalFormat pattern are provided that allow compact number formatting.
**percentFormats**- Pattern for use with percentage formatting
**scientificFormats**- Pattern for use with scientific (exponent) formatting.

Example:

<decimalFormats> <decimalFormatLength type="long"> <decimalFormat> <pattern>#,##0.###</pattern> </decimalFormat> </decimalFormatLength> </decimalFormats>

<scientificFormats> <default type="long"/> <scientificFormatLength type="long"> <scientificFormat> <pattern>0.000###E+00</pattern> </scientificFormat> </scientificFormatLength> <scientificFormatLength type="medium"> <scientificFormat> <pattern>0.00##E+00</pattern> </scientificFormat> </scientificFormatLength> </scientificFormats>

<percentFormats> <percentFormatLength type="long"> <percentFormat> <pattern>#,##0%</pattern> </percentFormat> </percentFormatLength> </percentFormats>

<!ATTLIST symbols numberSystem CDATA #IMPLIED >

The numberSystem attribute is used to specify that the given number formatting pattern(s)
are to be used when the given numbering system is active. By default, number
formatting patterns without a specific numberSystem attribute are assumed to be used for the "latn"
numbering system, which is western (ASCII) digits. Locales that specify a numbering system other
than "latn" as the default should also specify number formatting patterns that are appropriate
for use within the context of the given numbering system.

For more information on numbering systems and their definitions, see *Section 1: Numbering Systems*.

<decimalFormatLength type="long">

<decimalFormat>

<pattern type="1000" count="one">0 millier</pattern>

<pattern type="1000" count="other">0 milliers</pattern>

<pattern type="10000" count="one">00 mille</pattern>

<pattern type="10000" count="other">00 mille</pattern>

<pattern type="100000" count="one">000 mille</pattern>

<pattern type="100000" count="other">000 mille</pattern>

<pattern type="1000000" count="one">0 million</pattern>

<pattern type="1000000" count="other">0 millions</pattern>

...

</decimalFormat>

</decimalFormatLength>

<decimalFormatLength type="short">

<decimalFormat>

<pattern type="1000" count="one">0 K</pattern>

<pattern type="1000" count="other">0 K</pattern>

<pattern type="10000" count="one">00 K</pattern>

<pattern type="10000" count="other">00 K</pattern>

<pattern type="100000" count="one">000 K</pattern>

<pattern type="100000" count="other">000 K</pattern>

<pattern type="1000000" count="one">0 M</pattern>

<pattern type="1000000" count="other">0 M</pattern>

...

</decimalFormat>

To format a number N, the greatest type less than or equal to N is used, with the appropriate plural category. N is divided by the type, after removing the number of zeros in the pattern, less 1. APIs supporting this format should provide control over the number of significant or fraction digits.

Thus N=12345 matches `<pattern type="10000" count="other">00 K</pattern>`

. N is divided by 1000 (obtained from10000 after removing "00" and restoring one "0". The result is formatted according to the normal decimal pattern. With no fractional digits, that yields "12 K".

The short format is designed for UI environments where space is at a premium, and should ideally result in a formatted string no more than about 6 em wide (with no fractional digits).

Pattern for use with currency formatting. This format contains a few additional structural options that allow proper placement of the currency symbol relative to the numeric quantity. Refer to *Section 4 - Currencies* for additional information on the use of these options.

<!ELEMENT currencyFormats (alias | (default*, currencySpacing*, currencyFormatLength*, unitPattern*, special*)) >

<!ELEMENT currencySpacing (alias | (beforeCurrency*, afterCurrency*, special*)) >

<!ELEMENT beforeCurrency (alias | (currencyMatch*, surroundingMatch*, insertBetween*)) >

<!ELEMENT afterCurrency (alias | (currencyMatch*, surroundingMatch*, insertBetween*)) >

<!ELEMENT currencyMatch ( #PCDATA ) >

<!ELEMENT surroundingMatch ( #PCDATA )) >

<!ELEMENT insertBetween ( #PCDATA ) >

<!ELEMENT currencyFormatLength (alias | (default*, currencyFormat*, special*)) >

<!ATTLIST currencyFormatLength type ( full | long | medium | short ) #IMPLIED >

<!ELEMENT currencyFormat (alias | (pattern*, special*)) >

<currencyFormats> <currencyFormatLength type="long"> <currencyFormat> <pattern>¤ #,##0.00;(¤ #,##0.00)</pattern> </currencyFormat> </currencyFormatLength> </currencyFormats>

Number patterns affect how numbers are interpreted in a localized context. Here are some examples, based on the French locale. The "." shows where the decimal point should go. The "," shows where the thousands separator should go. A "0" indicates zero-padding: if the number is too short, a zero (in the locale's numeric set) will go there. A "#" indicates no padding: if the number is too short, nothing goes there. A "¤" shows where the currency sign will go. The following illustrates the effects of different patterns for the French locale, with the number "1234.567". Notice how the pattern characters ',' and '.' are replaced by the characters appropriate for the locale.

Pattern Currency Text #,##0.## n/a1 234,57 #,##0.### n/a1 234,567 ###0.##### n/a1234,567 ###0.0000# n/a1234,5670 00000.0000 n/a01234,5670 # ##0.00 ¤ EUR 1 234,57 € JPY 1 235 ¥

The number of # placeholder characters before the decimal do not matter, since no limit is placed on the maximum number of digits. There should, however,
be at least one zero someplace in the pattern. In currency formats, the number of digits after the decimal also do not matter, since the information in the
supplemental data (see *Supplemental Currency Data)* is used to override the number of decimal places — and the rounding
— according to the currency that is being formatted. That can be seen in the above chart, with the difference between Yen and Euro formatting.

*When parsing using a pattern, a lenient parse should
be used; see Lenient Parsing.*

Many characters in a pattern are taken literally; they are matched during parsing and output unchanged during formatting. Special characters, on the other hand, stand for other characters, strings, or classes of characters. For example, the '#' character is replaced by a localized digit. Often the replacement character is the same as the pattern character; in the U.S. locale, the ',' grouping character is replaced by ','. However, the replacement is still happening, and if the symbols are modified, the grouping character changes. Some special characters affect the behavior of the formatter by their presence; for example, if the percent character is seen, then the value is multiplied by 100 before being displayed.

To insert a special character in a pattern as a literal, that is, without any special meaning, the character must be quoted. There are some exceptions to this which are noted below.

Symbol Location Localized? Meaning 0 Number Yes Digit 1-9 Number Yes '1' through '9' indicate rounding. @ Number No Significant digit # Number Yes Digit, zero shows as absent . Number Yes Decimal separator or monetary decimal separator - Number Yes Minus sign , Number Yes Grouping separator E Number Yes Separates mantissa and exponent in scientific notation. Need not be quoted in prefix or suffix.+ Exponent Yes Prefix positive exponents with localized plus sign. Need not be quoted in prefix or suffix.; Subpattern boundary Yes Separates positive and negative subpatterns % Prefix or suffix Yes Multiply by 100 and show as percentage ‰

(\u2030)Prefix or suffix Yes Multiply by 1000 and show as per mille ¤ (\u00A4) Prefix or suffix No Currency sign, replaced by currency symbol. If doubled, replaced by international currency symbol. If tripled, uses the long form of the decimal symbol. If present in a pattern, the monetary decimal separator and grouping separators (if available) are used instead of the numeric ones. ' Prefix or suffix No Used to quote special characters in a prefix or suffix, for example, `"'#'#"`

formats 123 to`"#123"`

. To create a single quote itself, use two in a row:`"# o''clock"`

.* Prefix or suffix boundary Yes Pad escape, precedes pad character

A pattern contains a positive and may contain a negative subpattern, for example, "#,##0.00;(#,##0.00)". Each subpattern has a prefix, a numeric part, and a suffix. If there is no explicit negative subpattern, the negative subpattern is the localized minus sign prefixed to the positive subpattern. That is, "0.00" alone is equivalent to "0.00;-0.00". If there is an explicit negative subpattern, it serves only to specify the negative prefix and suffix; the number of digits, minimal digits, and other characteristics are ignored in the negative subpattern. That means that "#,##0.0#;(#)" has precisely the same result as "#,##0.0#;(#,##0.0#)".

Note:The thousands separator and decimal separator in this pattern are always ',' and '.'. They are substituted by the code with the correct local values according to other fields in CLDR.

The prefixes, suffixes, and various symbols used for infinity, digits, thousands separators, decimal separators,
and so on may be set to arbitrary values, and
they will appear properly during formatting. *However, care must be taken that the symbols and strings do not conflict, or parsing will be unreliable. *
For example, either the positive and negative prefixes or the suffixes must be distinct for any parser using this data to be able to distinguish positive from
negative values. Another example is that the decimal separator and thousands separator should be distinct characters, or parsing will be impossible.

The *grouping separator* is a character that separates clusters of integer digits to make large numbers more legible. It is commonly used for thousands,
but in some locales it separates ten-thousands. The *grouping size* is the number of digits between the grouping separators, such as 3 for "100,000,000"
or 4 for "1 0000 0000". There are actually two different grouping sizes: One used for the least significant integer digits, the *primary grouping size*,
and one used for all others, the *secondary grouping size*. In most locales these are the same, but sometimes they are different. For example, if the
primary grouping interval is 3, and the secondary is 2, then this corresponds to the pattern "#,##,##0", and the number 123456789 is formatted as "12,34,56,789".
If a pattern contains multiple grouping separators, the interval between the last one and the end of the integer defines the primary grouping size, and the
interval between the last two defines the secondary grouping size. All others are ignored, so "#,##,###,####" == "###,###,####" == "##,#,###,####".

For consistency in the CLDR data, the following conventions should be observed so as to have a canonical representation:

- All number patterns should be minimal: there should be no leading # marks except to specify the position of the grouping separators (for example, avoid ##,##0.###).
- All formats should have one 0 before the decimal point (for example, avoid #,###.##)
- Decimal formats should have three hash marks in the fractional position (for example, #,##0.###).
- Currency formats should have two zeros in the fractional position (for
example, ¤ #,##0.00).
- The exact number of decimals is overridden with the decimal count in supplementary data.

- The only time two thousands separators needs to be used is when the number of digits varies, such as for Hindi: #,##,##0.

Formatting is guided by several parameters, all of which can be specified either using a pattern or using the API. The following description applies to formats that do not use scientific notation or significant digits.

- If the number of actual integer digits exceeds the
*maximum integer digits*, then only the least significant digits are shown. For example, 1997 is formatted as "97" if the maximum integer digits is set to 2. - If the number of actual integer digits is less than the
*minimum integer digits*, then leading zeros are added. For example, 1997 is formatted as "01997" if the minimum integer digits is set to 5. - If the number of actual fraction digits exceeds the
*maximum fraction digits*, then half-even rounding it performed to the maximum fraction digits. For example, 0.125 is formatted as "0.12" if the maximum fraction digits is 2. This behavior can be changed by specifying a rounding increment and a rounding mode. - If the number of actual fraction digits is less than the
*minimum fraction digits*, then trailing zeros are added. For example, 0.125 is formatted as "0.1250" if the minimum fraction digits is set to 4. - Trailing fractional zeros are not displayed if they occur
*j*positions after the decimal, where*j*is less than the maximum fraction digits. For example, 0.10004 is formatted as "0.1" if the maximum fraction digits is four or less.

**Special Values**

`NaN`

is represented as a single character, typically `(\uFFFD)`

. This character is determined by the localized number symbols. This
is the only value for which the prefixes and suffixes are not used.

Infinity is represented as a single character, typically ∞ `(\u221E)`

, with the positive or negative prefixes and suffixes
applied. The infinity character is determined by the localized number symbols.

Numbers in scientific notation are expressed as the product of a mantissa and a power of ten, for example, 1234 can be expressed as 1.234 x 10^{3}.
The mantissa is typically in the half-open interval [1.0, 10.0) or sometimes [0.0, 1.0), but it need not be. In a pattern, the exponent character immediately
followed by one or more digit characters indicates scientific notation. Example: "0.###E0" formats the number 1234 as "1.234E3".

- The number of digit characters after the exponent character gives the minimum exponent digit count. There is no maximum. Negative exponents are formatted
using the localized minus sign,
*not*the prefix and suffix from the pattern. This allows patterns such as "0.###E0 m/s". To prefix positive exponents with a localized plus sign, specify '+' between the exponent and the digits: "0.###E+0" will produce formats "1E+1", "1E+0", "1E-1", and so on. (In localized patterns, use the localized plus sign rather than '+'.) - The minimum number of integer digits is achieved by adjusting the exponent. Example: 0.00123 formatted with "00.###E0" yields "12.3E-4". This only happens if there is no maximum number of integer digits. If there is a maximum, then the minimum number of integer digits is fixed at one.
- The maximum number of integer digits, if present, specifies the exponent grouping. The most common use of this is to generate
*engineering notation*, in which the exponent is a multiple of three, for example, "##0.###E0". The number 12345 is formatted using "##0.####E0" as "12.345E3". - When using scientific notation, the formatter controls the digit counts using significant digits logic. The maximum number of significant digits limits the total number of integer and fraction digits that will be shown in the mantissa; it does not affect parsing. For example, 12345 formatted with "##0.##E0" is "12.3E3". See the section on significant digits for more details.
- Exponential patterns may not contain grouping separators.

There are two ways of controlling how many digits are shows: (a) significant digits counts, or (b) integer and fraction digit counts. Integer and fraction digit counts are described above. When a formatter is using significant digits counts, the number of integer and fraction digits is not specified directly, and the formatter settings for these counts are ignored. Instead, the formatter uses however many integer and fraction digits are required to display the specified number of significant digits. Examples:

Pattern Minimum significant digits Maximum significant digits Number Output `@@@`

3 3 12345 `12300`

`@@@`

3 3 0.12345 `0.123`

`@@##`

2 4 3.14159 `3.142`

`@@##`

2 4 1.23004 `1.23`

- In order to enable significant digits formatting, use a pattern containing the
`'@'`

pattern character. In order to disable significant digits formatting, use a pattern that does not contain the`'@'`

pattern character. - Significant digit counts may be expressed using patterns that specify a minimum and maximum number of significant digits. These are indicated by the
`'@'`

and`'#'`

characters. The minimum number of significant digits is the number of`'@'`

characters. The maximum number of significant digits is the number of`'@'`

characters plus the number of`'#'`

characters following on the right. For example, the pattern`"@@@"`

indicates exactly 3 significant digits. The pattern`"@##"`

indicates from 1 to 3 significant digits. Trailing zero digits to the right of the decimal separator are suppressed after the minimum number of significant digits have been shown. For example, the pattern`"@##"`

formats the number 0.1203 as`"0.12"`

. - If a pattern uses significant digits, it may not contain a decimal separator, nor the
`'0'`

pattern character. Patterns such as`"@00"`

or`"@.###"`

are disallowed. - Any number of
`'#'`

characters may be prepended to the left of the leftmost`'@'`

character. These have no effect on the minimum and maximum significant digits counts, but may be used to position grouping separators. For example,`"#,#@#"`

indicates a minimum of one significant digits, a maximum of two significant digits, and a grouping size of three. - The number of significant digits has no effect on parsing.
- Significant digits may be used together with exponential notation. Such patterns are equivalent to a normal exponential pattern with a minimum and maximum
integer digit count of one, a minimum fraction digit count of
`Minimum Significant Digits - 1`

, and a maximum fraction digit count of`Maximum Significant Digits - 1`

. For example, the pattern`"@@###E0"`

is equivalent to`"0.0###E0"`

.

Patterns support padding the result to a specific width. In a pattern the pad escape character, followed by a single pad character, causes padding to be
parsed and formatted. The pad escape character is '*'. For example, `"$*x#,##0.00"`

formats 123 to `"$xx123.00"`

, and 1234 to `"$1,234.00"`

.

- When padding is in effect, the width of the positive subpattern, including prefix and suffix, determines the format width. For example, in the pattern
`"* #0 o''clock"`

, the format width is 10. - Some parameters which usually do not matter have meaning when padding is used, because the pattern width is significant with padding. In the pattern "* ##,##,#,##0.##", the format width is 14. The initial characters "##,##," do not affect the grouping size or maximum integer digits, but they do affect the format width.
- Padding may be inserted at one of four locations: before the prefix, after the prefix, before the suffix, or after the suffix. No padding can be specified in any other location. If there is no prefix, before the prefix and after the prefix are equivalent, likewise for the suffix.
- When specified in a pattern, the code point immediately following the pad escape is the pad character. This may be any character, including a special
pattern character. That is, the pad escape
*escapes*the following character. If there is no character after the pad escape, then the pattern is illegal.

**Rounding**

Patterns support rounding to a specific increment. For example, 1230 rounded to the nearest 50 is 1250. Mathematically, rounding to specific increments is performed by multiplying by the increment, rounding to an integer, then dividing by the increment. To take a more bizarre example, 1.234 rounded to the nearest 0.65 is 1.3, as follows:

Original: | 1.234 |
---|---|

Divide by increment (0.65): | 1.89846... |

Round: | 2 |

Multiply by increment (0.65): | 1.3 |

To specify a rounding increment in a pattern, include the increment in the pattern itself. "#,#50" specifies a rounding increment of 50. "#,##0.05" specifies a rounding increment of 0.05.

- Rounding only affects the string produced by formatting. It does not affect parsing or change any numerical values.
- An implementation may allow the specification of a
*rounding mode*to determine how values are rounded. In the absence of such choices, the default is to round "half-even", as described in IEEE arithmetic. That is, it rounds towards the "nearest neighbor" unless both neighbors are equidistant, in which case, it rounds towards the even neighbor. Behaves as for round "half-up" if the digit to the left of the discarded fraction is odd; behaves as for round "half-down" if it's even. Note that this is the rounding mode that minimizes cumulative error when applied repeatedly over a sequence of calculations. - Some locales use rounding in their currency formats to reflect the smallest currency denomination.
- In a pattern, digits '1' through '9' specify rounding, but otherwise behave identically to digit '0'.

Single quotes, (

'), enclose bits of the pattern that should be treated literally. Inside a quoted string, two single quotes ('') are replaced with a single one ('). For example:->'X '#' Q 'X 1939 Q(Literal stringsunderlined.)

<!ELEMENT currencies (alias | (default?, currency*, special*)) >

<!ELEMENT currency (alias | (((pattern+, displayName*, symbol*) | (displayName+, symbol*, pattern*) | (symbol+, pattern*))?,
decimal*, group*, special*)) >

<!ELEMENT symbol ( #PCDATA ) >

<!ATTLIST symbol choice ( true | false ) #IMPLIED > <!-- deprecated -->

Note:The term "pattern" appears twice in the above. The first is for consistency with all other cases of pattern + displayName; the second is for backwards compatibility.

<currencies> <currency type="USD"> <displayName>Dollar</displayName> <symbol>$</symbol> </currency> <currency type ="JPY"> <displayName>Yen</displayName> <symbol>¥</symbol> </currency> <currency type="PTE"> <displayName>Escudo</displayName> <symbol>$</symbol> </currency> </currencies>

In formatting currencies, the currency number format is used with the appropriate symbol from <currencies>, according to the currency code. The <currencies> list can contain codes that are no longer in current use, such as PTE. The choice attribute has been deprecated.

The count attribute distinguishes the different plural forms, such as in the following:

<currencyFormats> <unitPattern count="other">{0} {1}</unitPattern> ... <currencies>

<currency type="ZWD"> <displayName>Zimbabwe Dollar</displayName> <displayName count="one">Zimbabwe dollar</displayName> <displayName count="other">Zimbabwe dollars</displayName> <symbol>Z$</symbol> </currency>

To format a particular currency value "ZWD" for a particular numeric value *n*:

- First see if there is a count with an explicit number (0 or 1). If so, use that string.
- Otherwise, determine the count value that corresponds to
*n*using the rules in*Section 5 - Language Plural Rules* - Next,
get the currency unitPattern.
- Look for a unitPattern element that matches the count value, starting in the current locale and then following the locale fallback chain up to, but not including root.
- If no matching unitPattern element was found in the previous step, then look for a unitPattern element that matches count="other", starting in the current locale and then following the locale fallback chain up to root (which has a unitPattern element with count="other" for every unit type).
- The resulting unitPattern element indicates the appropriate positioning of the numeric value and the currency display name.

- Next,
get the displayName element for the currency.
- Look for a displayName element that matches the count value, starting in the current locale and then following the locale fallback chain up to, but not including root.
- If no matching displayName element was found in the previous step, then look for a displayName element that matches count="other", starting in the current locale and then following the locale fallback chain up to, but not including root.
- If no matching displayName element was found in the previous step, then look for a displayName element that with no count, starting in the current locale and then following the locale fallback chain up to root.
- If there is no displayName element, use the currency code itself (for example, "ZWD").

- The numeric value, formatted according to the locale with the number of decimals appropriate for the currency, is substituted for {0} in the unitPattern, while the currency display name is substituted for the {1}.

While for English this may seem overly complex, for some other languages different plural forms are used for different unit types; the plural forms for certain unit types may not use all of the plural-form tags defined for the language.

For example, if the the currency is ZWD and the number is 1234, then the latter maps to count="other" for English. The unit pattern for that is "{0} {1}", and the display name is "Zimbabwe dollars". The final formatted number is then "1,234 Zimbabwe dollars".

When the currency symbol is substituted into a pattern, there may be some further modifications, according to the following.

<currencySpacing> <beforeCurrency> <currencyMatch>[:letter:]</currencyMatch> <surroundingMatch>[:digit:]</surroundingMatch> <insertBetween> </insertBetween> </beforeCurrency> <afterCurrency> <currencyMatch>[:letter:]</currencyMatch> <surroundingMatch>[:digit:]</surroundingMatch> <insertBetween> </insertBetween> </afterCurrency> </currencySpacing>

This element controls whether additional characters are inserted on the boundary between the symbol and the pattern. For example,
with the above *currencySpacing*, inserting
the symbol "US$" into the pattern "#,##0.00¤" would result in an extra *no-break space* inserted before the symbol,
for example, "#,##0.00 US$".
The *beforeCurrency* element governs this case, since we are
looking *before* the "¤" symbol. The *currencyMatch* is positive, since the "U" in "US$" is
at the start of the currency symbol being substituted. The *surroundingMatch* is positive, since
the character just before the "¤" will be a digit. Because these
two conditions are true, the insertion is made.

Conversely, look at the
pattern "¤#,##0.00" with the symbol "US$". In this case, there is no insertion; the
result is simply "US$#,##0.00". The *afterCurrency* element governs this case,
since we are looking *after* the "¤" symbol. The surroundingMatch is positive, since the
character just after the "¤" will be a digit. However, the currencyMatch is **not** positive,
since the "$" in "US$" is at the end of the currency symbol being substituted. So the insertion
is not made.

For more information
on the matching used in the currencyMatch and surroundingMatch elements, see the main document *Appendix E: Unicode Sets*.

Currencies can also contain optional grouping, decimal data, and pattern elements. This data is inherited from the <symbols> in the same locale data (if
not present in the chain up to root), so only the *differing* data will be present. See the main document *Section 4.1 Multiple Inheritance*.

Note:Currency values shouldLocale data contains localization information for currencies, not a currency value for a country. A currency amount logically consists of a numeric value, plus an accompanying currency code (or equivalent). The currency code may be implicit in a protocol, such as where USD is implicit. But if the raw numeric value is transmitted without any context, then it has no definitive interpretation.neverbe interchanged without a known currency code. You never want the number 3.5 interpreted as $3.5 by one user and ¥3.5 by another.

Notice that the currency code is completely independent of the end-user's language or locale. For example, RUR is the code for Russian Rubles. A currency amount of <RUR, 1.23457×10³> would be localized for a Russian user into "1 234,57р." (using U+0440 (р) cyrillic small letter er). For an English user it would be localized into the string "Rub 1,234.57" The end-user's language is needed for doing this last localization step; but that language is completely orthogonal to the currency code needed in the data. After all, the same English user could be working with dozens of currencies.Notice also that the currency code is also independent of whether currency values are inter-converted, which requires more interesting financial processing: the rate of conversion may depend on a variety of factors.

Thus logically speaking, once a currency amount is entered into a system, it should be logically accompanied by a currency code in all processing. This currency code is independent of whatever the user's original locale was. Only in badly-designed software is the currency code (or equivalent) not present, so that the software has to "guess" at the currency code based on the user's locale.

Note:The number of decimal placesandthe rounding for each currency is not locale-specific data, and is not contained in the Locale Data Markup Language format. Those values override whatever is given in the currency numberFormat. For more information, seeSupplemental Currency Data.

For background information on currency names, see [CurrencyInfo].

<!ELEMENT currencyData ( fractions*, region+ ) >

<!ELEMENT fractions ( info+ ) >

<!ELEMENT info EMPTY >

<!ATTLIST info iso4217 NMTOKEN #REQUIRED >

<!ATTLIST info digits NMTOKEN #IMPLIED >

<!ATTLIST info rounding NMTOKEN #IMPLIED >

<!ELEMENT region ( currency* ) >

<!ATTLIST region iso3166 NMTOKEN #REQUIRED >

<!ELEMENT currency ( alternate* ) >

<!ATTLIST currency from NMTOKEN #IMPLIED >

<!ATTLIST currency to NMTOKEN #IMPLIED >

<!ATTLIST currency iso4217 NMTOKEN #REQUIRED >

<!ATTLIST currency tender ( true | false ) #IMPLIED >

Each currencyData element contains one fractions element followed by one or more region elements. Here is an example for illustration.

<supplementalData> <currencyData> <fractions> ... <info iso4217="CHF" digits="2" rounding="5"/> ... <info iso4217="ITL" digits="0"/> ... </fractions> ... <region iso3166="IT"> <currency iso4217="EUR" from="1999-01-01"/> <currency iso4217="ITL" from="1862-8-24" to="2002-02-28"/> </region> ... <region iso3166="CS"> <currency iso4217="EUR" from="2003-02-04"/> <currency iso4217="CSD" from="2002-05-15"/> <currency iso4217="YUM" from="1994-01-24" to="2002-05-15"/> </region> ... </currencyData> ... </supplementalData>

The fractions element contains any number of info elements, with the following attributes:

**iso4217:**the ISO 4217 code for the currency in question. If a particular currency does not occur in the fractions list, then it is given the defaults listed for the next two attributes.**digits:**the number of decimal digits normally formatted. The default is 2.**rounding:**the rounding increment, in units of 10^{-digits}. The default is 1. Thus with fraction digits of 2 and rounding increment of 5, numeric values are rounded to the nearest 0.05 units in formatting. With fraction digits of 0 and rounding increment of 50, numeric values are rounded to the nearest 50.

Each region element contains one attribute:

**iso3166:**the ISO 3166 code for the region in question. The special value*XXX*can be used to indicate that the region has no valid currency or that the circumstances are unknown (usually used in conjunction with*before*, as described below).

And can have any number of currency elements, with the ordered subelements.

<region iso3166="IT"> <!-- Italy --> <currency iso4217="EUR" from="2002-01-01"/> <currency iso4217="ITL" to="2001-12-31"/> </region>

**iso4217:**the ISO 4217 code for the currency in question. Note that some additional codes that were in widespread usage are included, others such as GHP are not included because they were never used.**from:**the currency was valid from to the datetime indicated by the value. See the main document*Section 5.2.1 Dates and Date Ranges*.**to:**the currency was valid up to the datetime indicated by the value of*before*. See the main document*Section 5.2.1 Dates and Date Ranges*.-
**tender:**indicates whether or not the ISO currency code represents a currency that was or is legal tender in some country. The default is "true". Certain ISO codes represent things like financial instruments or precious metals, and do not represent normally interchanged currencies.

That is, each currency element will list an interval in which it was valid. The *ordering* of the elements in the list tells us which was the primary
currency during any period in time. Here is an example of such an overlap:

<currency iso4217="CSD" to="2002-05-15"/> <currency iso4217="YUD" from="1994-01-24" to="2002-05-15"/> <currency iso4217="YUN" from="1994-01-01" to="1994-07-22"/>

The *from* element is limited by the fact that ISO 4217 does not go very far back in time, so there may be no ISO code for
the previous currency.

Currencies change relatively frequently. There are different types of changes:

- YU=>CS (name change)
- CS=>RS+ME (split, different names)
- SD=>SD+SS (split, same name for one // South Sudan splits from Sudan)
- DE+DD=>DE (Union, reuses one name // East Germany unifies with Germany)

The UN Information is used to determine dates due to country changes.

When a code is no longer in use, it is terminated (see #1, #2, #4, #5)

Example:

- <currency iso4217="EUR" from="2003-02-04" to="2006-06-03"/>

When codes split, each of the new codes inherits (see #2, #3) the previous data. However, some modifications can be made if it is clear that currencies were only in use in one of the parts.

When codes merge, the data is copied from the most populous part.

Example. When CS split into RS and ME:

- RS & ME copy the former CS, except that the line for EUR is dropped from RS
- CS now terminates on Jun 3, 2006 (following the UN info)

<!ELEMENT plurals (pluralRules*) >

<!ATTLIST plurals type ( ordinal | cardinal ) #IMPLIED > <!-- default is cardinal -->

<!ELEMENT pluralRules (pluralRule*) >

<!ATTLIST pluralRules locales NMTOKENS #REQUIRED >

<!ELEMENT pluralRule ( #PCDATA ) >

<!ATTLIST pluralRule count (zero | one | two | few | many) #REQUIRED >

This section defines certain types of plural forms that exist in a language—namely, the cardinal and ordinal plural forms for nouns. Cardinal plural forms express units such as time, currency or distance, used in conjunction with a number expressed in decimal digits (i.e. "2", not "two", and not an indefinite number such as "some" or "many"). Ordinal plural forms denote the order of items in a set and are always integers. For example, English has two forms for cardinals:

- form "one": 1 day
- form "other": 0 days, 2 days, 10 days, 0.3 days

and four forms for ordinals:

- form "one": 1st floor, 21st floor, 101st floor
- form "two": 2nd floor, 22nd floor, 102nd floor
- form "few": 3rd floor, 23rd floor, 103rd floor
- form "other": 4th floor, 11th floor, 96th floor

Other languages may have additional forms or only one form for each type of plural. CLDR provides the following tags for designating the various plural forms of a language; for a given language, only the tags necessary for that language are defined, along with the specific numeric ranges covered by each tag (for example, the plural form "few" may be used for the numeric range 2-4 in one language and 3-9 in another):

- zero
- one
- two
- few
- many

In addition, an "other" tag is always implicitly defined to cover the forms not explicitly designated by the tags defined for a language. This "other" tag is also used for languages that only have a single form (in which case no plural-form tags are explicitly defined for the language). For a more complex example, consider the cardinal rules for Russian and certain other languages:

<pluralRules locales="hr ru sr uk"> <pluralRules count="one">n mod 10 is 1 and n mod 100 is not 11</pluralRule> <pluralRules count="few">n mod 10 in 2..4 and n mod 100 not in 12..14</pluralRule> </pluralRules>

These rules specify that Russian has a "one" form (for 1, 21, 31, 41, 51, …), a "few" form (for 2-4, 22-24, 32-34, …), and implicitly an "other" form (for everything else: 0, 5-20, 25-30, 35-40, …, decimals). Russian does not need additional separate forms for zero, two, or many, so these are not defined.

The xml value for each pluralRule is a *condition* with a boolean result that specifies whether that rule
(i.e. that plural form) applies to a given numeric value *n*, where n can be expressed as a decimal fraction. Conditions have the following syntax:

condition = and_condition ('or' and_condition)* and_condition = relation ('and' relation)* relation = is_relation | in_relation | within_relation is_relation = expr 'is' ('not')? value in_relation = expr ('not')? 'in' range_list

within_relation = expr ('not')? 'within' range_list expr = 'n' ('mod' value)? range_list = (range | value) (',' range_list)* value = digit+ digit = 0|1|2|3|4|5|6|7|8|9 range = value'..'value

- Whitespace (defined as Unicode Pattern_White_Space) can occur between or around any of the above tokens.
- In the syntax,
**and**binds more tightly than**or**. So**X or Y and Z**is interpreted as**(X or (Y and Z))**. - Each plural rule must be written to be self-contained, and not depend on the ordering. Thus rules must be mutually exclusive; for a given numeric value, only one rule can apply (i.e. the condition can only be true for one of the pluralRule elements.
- The
**in**and**within**relations can take comma-separated lists, such as:**n in 3,5,7..15**. The difference between**in**and**within**is that**in**only includes integers in the specified range, while**within**includes all values. **mod**(modulus) is a remainder operation as defined in Java; for example, where**n**= 4.3 the result of**n mod 3**is 1.3.- To detect an integer in a rule, use
**n mod 1 is 0**. Conversely, for a fraction use:**n mod 1 is not 0**.

Examples:

one: n is 1 few: n in 2..4 |
This defines two rules, for 'one' and 'few'. The condition for 'one' is "n is 1" which means that the number must be equal to 1 for this condition to pass. The condition for 'few' is "n in 2..4" which means that the number must be between 2 and 4 inclusive for this condition to pass. All other numbers are assigned the keyword 'other' by the default rule. |

zero: n is 0 or n is not 1 and n mod 100 in 1..19 one: n is 1 |
Each rule must not overlap with other rules. Also note that a modulus is applied to n in the last rule, thus its condition holds for 119, 219, 319... |

one: n is 1 few: n mod 10 in 2..4 and n mod 100 not in 12..14 |
This illustrates conjunction and negation. The condition for 'few' has two
parts, both of which must be met: "n mod 10 in 2..4" and "n mod 100 not in
12..14". The first part applies a modulus to n before the test as in the
previous example. The second part applies a different modulus and also uses
negation, thus it matches all numbers not in 12, 13, 14, 112, 113, 114, 212,
213, 214... |

Elements such as <currencyFormats>, <currency> and <unit> provide selection among subelements designating various localized cardinal plural forms by tagging each of the relevant subelements with a different count value, or with no count value in some cases. Note that the plural forms for a specific currencyFormat, unit type, or currency type may not use all of the different plural-form tags defined for the language. To format a currency or unit type for a particular numeric value, determine the count value according to the plural rules for the language, then select the appropriate display form for the currency format, currency type or unit type using the rules in those sections:

- 2.3 Number Symbols (for currencyFormats elements)
- Section 4 Currencies (for currency elements)
- The main document section 5.11 Unit Elements

There are two extra values that can be used with count attributes: 0 and 1. These are used for the explicit values, and may or may not be the same as the forms for "zero" and "one".

<!ELEMENT rbnf ( alias | rulesetGrouping*) >

<!ELEMENT rulesetGrouping ( alias | ruleset*) >

<!ATTLIST rulesetGrouping type NMTOKEN #REQUIRED>

<!ELEMENT ruleset ( alias | rbnfrule*) >

<!ATTLIST ruleset type NMTOKEN #REQUIRED>

<!ATTLIST ruleset access ( public | private ) #IMPLIED >

<!ELEMENT rbnfrule ( #PCDATA ) >

<!ATTLIST rbnfrule value CDATA #REQUIRED >

<!ATTLIST rbnfrule radix CDATA #IMPLIED >

<!ATTLIST rbnfrule decexp CDATA #IMPLIED >

The rule-based number format (RBNF)
encapsulates a set of rules for mapping binary numbers to and from a
readable representation. They are typically used for spelling out numbers,
but can also be used for other number systems like roman numerals, Chinese
numerals, or for ordinal numbers (1st, 2nd, 3rd,...). The syntax used in the
CLDR representation of rules is intended to be simply a transcription of ICU
based RBNF rules into an XML compatible syntax. The rules are fairly
sophisticated; for details see *Rule-Based Number Formatter* [RBNF].

<ruleSetGrouping>

Used to group rules into functional sets for use with ICU. Currently, the valid types of rule set groupings are "SpelloutRules", "OrdinalRules", and "NumberingSystemRules".

<ruleset>

This element denotes a specific rule set to the number formatter. The ruleset is assumed to be a public ruleset unless the attribute type="private" is specified.

<rule>

Contains the actual formatting rule for a particular number or sequence of numbers. The "value" attribute is used to indicate the starting number to which the rule applies. The actual text of the rule is identical to the ICU syntax, with the exception that Unicode left and right arrow characters are used to replace < and > in the rule text, since < and > are reserved characters in XML. The "radix" attribute is used to indicate an alternate radix to be used in calculating the prefix and postfix values for number formatting. Alternate radix values are typically used for formatting year numbers in formal documents, such as "nineteen hundred seventy-six" instead of "one thousand nine hundred seventy-six".

The following elements are relevant to determining the value of a parsed number:

- A possible prefix or suffix, indicating sign
- A possible currency symbol or code
- Decimal digits
- A possible decimal separator
- A possible exponent
- A possible percent or per mille character

Other characters should either be ignored, or indicate the end of input, depending on the application. The key point is to disambiguate the sets of characters that might serve in more than one position, based on context. For example, a period might be either the decimal separator, or part of a currency symbol (for example, "NA f."). Similarly, an "E" could be an exponent indicator, or a currency symbol (the Swaziland Lilangeni uses "E" in the "en" locale). An apostrophe might be the decimal separator, or might be the grouping separator.

Here is a set of heuristic rules that may be helpful:

- Any character with the decimal digit property is unambiguous and should be accepted.
**Note:**In some environments, applications may independently wish to restrict the decimal digit set to prevent security problems. See [UTR36]. - The exponent character can only be interpreted as such if it occurs after at least one digit, and if it is followed by at least one digit, with only an optional sign in between. A regular expression may be helpful here.
- For the sign, decimal separator, percent, and per mille, use a set of all possible characters that can serve those functions. For example, the decimal separator set could include all of [.,']. (The actual set of characters can be derived from the number symbols in the By-Type charts [ByType], which list all of the values in CLDR.) To disambiguate, the decimal separator for the locale must be removed from the "ignore" set, and the grouping separator for the locale must be removed from the decimal separator set. The same principle applies to all sets and symbols: any symbol must appear in at most one set.
- Since there are a wide variety of currency symbols and codes, this should be tried before the less ambiguous elements. It may be helpful to develop a set of characters that can appear in a symbol or code, based on the currency symbols in the locale.
- Otherwise, a character should be ignored unless it is in the "stop" set. This includes even characters that are meaningful for formatting, for example, the grouping separator.
- If more than one sign, currency symbol, exponent, or percent/per mille occurs in the input, the first found should be used.
- A currency symbol in the input should be interpreted as the longest match found in the set of possible currency symbols.
- Especially in cases of ambiguity, the user's input should be echoed back, properly formatted according to the locale, before it is actually used for anything.

Copyright © 2001-2013 Unicode, Inc. All Rights Reserved. The Unicode Consortium makes no expressed or implied warranty of any kind, and assumes no liability for errors or omissions. No liability is assumed for incidental and consequential damages in connection with or arising out of the use of the information or programs contained or accompanying this technical report. The Unicode Terms of Use apply.

Unicode and the Unicode logo are trademarks of Unicode, Inc., and are registered in some jurisdictions.