Coping with Change
Q: I don't see why I should update to the latest version of the Unicode Standard. Are there any important new characters?
There are important changes in almost every release of Unicode. If your implementation is still at Unicode 4.0 or 5.0, then quite a few important characters have
been added in more recent versions. For example, a significant number of characters
important for support of languages in India and Southeast Asia have been added. For East Asia,
characters have been added to fill out compatibility with important
standards such as JIS X 213, GB 18030, and HKSCS. Additionally, many symbols have been added that are important for interoperability with the Japanese television standard and Japanese mobile phones. All of the characters are important to some user community.
Q: Which characters exactly were added?
A: If you
can see which have been added to each successive version of the standard.
Q: Fonts and input methods or keyboards are really expensive to
produce. Do I have to support all the new characters for them?
A: Supporting the latest version of Unicode does not require that you have fonts or
keyboards for all the characters. You always have a choice of what
repertoire of Unicode characters you want to support in your product.
Fonts and keyboards can be added incrementally.
Q: But what else would I want to support in
the latest version of the standard?
A: Even if you are not supplying keyboards and fonts you will
probably need your software to handle the properties of the new
characters correctly. There is also a major update to the handling of bidirectional text in Unicode 6.3.
Q: Why should I support Unicode properties?
Unicode properties are widely used under the
covers. Text parsers will use them to separate out letters from
punctuation and symbols. Anything that uses regular expressions, such as XMLSchema, will use them. They are used in uppercase/lowercase
conversions, and in case-insensitive matching. They also coordinate with
the latest versions of the Unicode Collation Algorithm, for sorting.
In globalization coding guidelines, we strongly recommend that
hard-coded expressions like
if ('a' <= x && x <= 'z' || 'A' <= x && x <= 'Z') doSomething();
should normally change to use appropriate Unicode properties,
something like the following (depending on what was originally meant):
if (getCategory(x) == LETTER) doSomething(); or
if (getCategory(x) == LETTER && getScript(x) == LATIN) doSomething();
Using an old version of Unicode will mean that new characters will be
ignored in such processing, or included where they are not meant to be.
Importantly, fixes in properties—even for old characters—are made
over time, and using the latest version of the properties ensures that
you have the most accurate data you can.
Q: Is it cost-effective to update the Unicode character
properties in my product?
A: There are good reasons to always update the Unicode
characters properties to the latest version when you can, since the cost
is rather small (i.e. typically updating data tables) compared to the
benefits. For servers and middleware, the support for new Unicode characters will typically amount to just
updating the property tables appropriately.
Q: How do I find out about all the different versions of Unicode?
A: Documentation of the contents of each version of the Unicode
Standard is found on the
Enumerated Versions page. That page also provides links to blog posts which provide information about what was especially important in each new version.
Q: How do I cite the Unicode Standard in my references?
A: See Versions
of the Unicode Standard.
Q: How much does the Unicode Standard change between different
A: Characters can be added in each major or minor version of the
standard. Properties and other specifications can be added or changed.
However, all changes are subject to the Unicode stability policy. See
Character Encoding Stability Policy for more information.