[Unicode]  Frequently Asked Questions Home | Site Map | Search

Coping with Change

Q: I don't see why I should update to the latest version of the Unicode Standard. Are there any important new characters?

A: There are important changes in almost every release of the Unicode Standard. If your implementation is still at an earlier version, then you are missing many characters added in more recent versions. For example, a significant number of characters important for support of languages in India and Southeast Asia are added in new versions. For East Asia, characters have been added to fill out compatibility with major standards such as JIS X 213, GB 18030, and HKSCS. Additionally, many symbols, including very popular emoji, appear in each new version. All of the characters are important to some user community.

Q: Which characters exactly were added?

A: If you look at http://www.unicode.org/Public/UCD/latest/ucd/DerivedAge.txt you can see which have been added to each successive version of the standard.

Q: Fonts and input methods or keyboards are really expensive to produce. Do I have to support all the new characters for them?

A: Supporting the latest version of Unicode does not require that you have fonts or keyboards for all the characters. You always have a choice of what repertoire of Unicode characters you want to support in your product. Fonts and keyboards can be added incrementally.

Q: But what else would I want to support in the latest version of the standard?

A: Even if you are not supplying keyboards and fonts you will probably need your software to handle the properties of the new characters correctly.

Q: Why should I support Unicode properties?

Unicode properties are widely used under the covers. Text parsers will use them to separate out letters from punctuation and symbols. Anything that uses regular expressions, such as XMLSchema, will use them. They are used in uppercase/lowercase conversions, and in case-insensitive matching. They also coordinate with the latest versions of the Unicode Collation Algorithm, for sorting.

In globalization coding guidelines, we strongly recommend that hard-coded expressions like

if ('a' <= x && x <= 'z' || 'A' <= x && x <= 'Z') doSomething();

should normally change to use appropriate Unicode properties, something like the following (depending on what was originally meant):

if (getCategory(x) == LETTER) doSomething(); or
if (getCategory(x) == LETTER && getScript(x) == LATIN) doSomething();

Using an old version of Unicode will mean that new characters will be ignored in such processing, or included where they are not meant to be. Importantly, fixes in properties—even for old characters—are made over time, and using the latest version of the properties ensures that you have the most accurate data you can.

Q: Is it cost-effective to update the Unicode character properties in my product?

A: There are good reasons to always update the Unicode characters properties to the latest version when you can, since the cost is rather small (i.e. typically updating data tables) compared to the benefits. For servers and middleware, the support for new Unicode characters will typically amount to just updating the property tables appropriately.

Q: How do I find out about all the different versions of Unicode?

A: Documentation of the contents of each version of the Unicode Standard is found on the Enumerated Versions page. That page also provides links to blog posts which provide information about what was especially important in each new version.

Q: How do I cite the Unicode Standard in my references?

A: See Versions of the Unicode Standard.

Q: How much does the Unicode Standard change between different versions?

A: Characters can be added in each major or minor version of the standard. Properties and other specifications can be added or changed. However, all changes are subject to the Unicode stability policy. See the Character Encoding Stability Policy for more information.