Unicode 3.0.1 update beta data files available

From: Kenneth Whistler (kenw@sybase.com)
Date: Fri Aug 04 2000 - 21:32:56 EDT


The beta directory for the Unicode 3.0.1 update has been created.

Due to the current problem with anonymous ftp on www.unicode.org,
only the http version of this directory is currently available:

http://www.unicode.org/Public/3.0-Update1/

The updated beta files at that location for the Unicode 3.0.1 update are:

   5179 Jul 31 21:14 ArabicShaping-3d1.beta.txt
  43559 Jul 31 21:14 CaseFolding-2d1.beta.txt
   5085 Jul 31 21:14 CompositionExclusions-2d1.beta.txt
  55254 Jul 31 21:14 PropList-3.0.1d2.beta.txt
  13841 Jul 31 21:14 SpecialCasing-3d2.beta.txt
  48261 Jul 31 21:14 UnicodeData-3.0.1d1.beta.html
 636269 Jul 31 21:15 UnicodeData-3.0.1d2.beta.txt

These are temporary names. Once the beta review closes, the "beta" and
the delta number on the files will be dropped for the permanent
versioned filename, and the latest versions of the files will
be copied into the UNIDATA directory minus the version extension.

And comparable changes will be made in the ftp hierarchy as well, as
soon as regular ftp service can be restored on the server.

Before that happens, however, we would like to invite all interested
implementers to examine the data files and report any problems you
find in them, so that any problem can be corrected before the finalization
of the Unicode 3.0.1 update.

Note that UnicodeData.txt and PropList.txt now explicitly contain
codepoint listings using the 5- or 6-digit UTF-32 notation. If
you are using automated parsers on either of those files, be aware
of this change in convention and make sure your code is prepared
to handle parsing of codepoint values greater than 0xFFFF.

We are introducing this change now with the relatively trivial
listing of user-defined, unassigned, and not-a-character codepoints
past U+FFFF, so people can test out their implementations before
they get whumped with 40,000+ new characters from Planes 1, 2,
and 14 for the upcoming Unicode 3.1.

--Ken Whistler

===================================================================

The changes in the data files from the 3.0.0 release version are
as follows:

ArabicShaping.txt

   Updated the shaping class for 0671.

CaseFolding.txt

   This is a new contributory data file. See UTR #21, Case Mappings.

CompositionExclusions.txt

   Fixed a comment in the file.
   Added a minimal label/version comment at the top of the file.

PropList.txt

   Removed F8F0..F8FF from a listing of several properties. (Bug)
   Fix the default bidi property to LR for all user-defined character
      ranges.
   Updated properties for 0E47. (removed from alphabetics, added to
      diacritics)
   Extended property listing to full UTF-16 range for user-defined
      characters (including Planes 15 and 16), for bidi LR, and
      for unassigned characters.
   Added not-a-character property (a property of codepoints, not
      of characters), and provided listing for full UTF-16 range.

SpecialCasing.txt

   Minor fixes to the BNF syntax.
   Addition of Lithuanian AFTER condition.
   Addition to notes in the comments in the file.

UnicodeData.html

   Corrected a bullet numbering problem.
   Added documentation of range listing for Plane 15 and Plane 16
      user-defined characters.
   Added documentation of 4/5/6 digit hex notation conventions.

UnicodeData.txt

   Added definition ranges for Plane 15, and Plane 16 user-defined
      characters.
   Added "dena sum" in the ISO comment field for 0FCF.
   Added 10646-1 Annex P asterisk comments to 01A6, 0280.



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:06 EDT