Unicode 3.0.1 update beta data files available

From: Kenneth Whistler (kenw@sybase.com)
Date: Fri Aug 04 2000 - 21:32:56 EDT

The beta directory for the Unicode 3.0.1 update has been created.

Due to the current problem with anonymous ftp on www.unicode.org,
only the http version of this directory is currently available:


The updated beta files at that location for the Unicode 3.0.1 update are:

   5179 Jul 31 21:14 ArabicShaping-3d1.beta.txt
  43559 Jul 31 21:14 CaseFolding-2d1.beta.txt
   5085 Jul 31 21:14 CompositionExclusions-2d1.beta.txt
  55254 Jul 31 21:14 PropList-3.0.1d2.beta.txt
  13841 Jul 31 21:14 SpecialCasing-3d2.beta.txt
  48261 Jul 31 21:14 UnicodeData-3.0.1d1.beta.html
 636269 Jul 31 21:15 UnicodeData-3.0.1d2.beta.txt

These are temporary names. Once the beta review closes, the "beta" and
the delta number on the files will be dropped for the permanent
versioned filename, and the latest versions of the files will
be copied into the UNIDATA directory minus the version extension.

And comparable changes will be made in the ftp hierarchy as well, as
soon as regular ftp service can be restored on the server.

Before that happens, however, we would like to invite all interested
implementers to examine the data files and report any problems you
find in them, so that any problem can be corrected before the finalization
of the Unicode 3.0.1 update.

Note that UnicodeData.txt and PropList.txt now explicitly contain
codepoint listings using the 5- or 6-digit UTF-32 notation. If
you are using automated parsers on either of those files, be aware
of this change in convention and make sure your code is prepared
to handle parsing of codepoint values greater than 0xFFFF.

We are introducing this change now with the relatively trivial
listing of user-defined, unassigned, and not-a-character codepoints
past U+FFFF, so people can test out their implementations before
they get whumped with 40,000+ new characters from Planes 1, 2,
and 14 for the upcoming Unicode 3.1.

--Ken Whistler


The changes in the data files from the 3.0.0 release version are
as follows:


   Updated the shaping class for 0671.


   This is a new contributory data file. See UTR #21, Case Mappings.


   Fixed a comment in the file.
   Added a minimal label/version comment at the top of the file.


   Removed F8F0..F8FF from a listing of several properties. (Bug)
   Fix the default bidi property to LR for all user-defined character
   Updated properties for 0E47. (removed from alphabetics, added to
   Extended property listing to full UTF-16 range for user-defined
      characters (including Planes 15 and 16), for bidi LR, and
      for unassigned characters.
   Added not-a-character property (a property of codepoints, not
      of characters), and provided listing for full UTF-16 range.


   Minor fixes to the BNF syntax.
   Addition of Lithuanian AFTER condition.
   Addition to notes in the comments in the file.


   Corrected a bullet numbering problem.
   Added documentation of range listing for Plane 15 and Plane 16
      user-defined characters.
   Added documentation of 4/5/6 digit hex notation conventions.


   Added definition ranges for Plane 15, and Plane 16 user-defined
   Added "dena sum" in the ISO comment field for 0FCF.
   Added 10646-1 Annex P asterisk comments to 01A6, 0280.

