Beta Unicode Character Database 3.1

From: Mark Davis (
Date: Fri Dec 15 2000 - 17:36:58 EST

The first files in the Beta Unicode Character Database 3.1 are available
public review in Unicode 3.1
contains a large number of new characters, in the following new blocks:

10300; 1032F; Old Italic
10330; 1034F; Gothic
10400; 1044F; Deseret
1D000; 1D0FF; Byzantine Musical Symbols
1D100; 1D1FF; Musical Symbols
1D400; 1D7FF; Mathematical Alphanumeric Symbols
20000; 2A6D6; CJK Unified Ideographs Extension B
2F800; 2FA1F; CJK Compatibility Ideographs Supplement
E0000; E007F; Tags

Note that these are supplementary characters, whose code points are given
in 5-digit hex form (UTF-32). See for more


- DO NOT use the data from these files in any released product. The beta
versions of these data files will not be maintained and are subject to
change without notice during the beta period for Unicode 3.1. They are
preliminary versions, and have known errors and omissions.

- DO test the data in these files with your products, and report any errors
to, with the title "UCD 3.1 Bug". Any messages that are
sent to other mailing lists, or that do not have that precise title, will
NOT be collected. Before reporting errors, please read


- Check your parsers carefully because the data files will contain
supplementary code points, using 5 or 6 hex digits instead of just 4.

- The main data file, UnicodeData.txt, has not changed in format, except
for the addition of supplementary code points. However, there are format
changes in other data files.

- The PropList file has a substantially new format, to make it more
machine-readable and reduce duplication. It will have further changes that
have not yet gone in.

- The East Asian Width and Line Break files have a slight revision in
format: the name
(which is duplicate information) is now in a comment rather than a field.

- CaseFolding has a new format that makes it much easier to distinguish
and simple case foldings. It does not yet include case foldings for

- The documentation files (*.html) have not yet been updated.

- Any changes requiring decision by the Unicode Technical Committee will be
reviewed at the next meeting, Jan 29 - Feb 1. Please submit any error
reports by Jan 22. (Cf.

Mark Davis, IBM GCoC, Cupertino
(408) 777-5850 [fax: 5891],,

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:17 EDT