Re: UnicodeData.txt problem

From: Kenneth Whistler (kenw@sybase.com)
Date: Thu Dec 08 2005 - 16:35:33 CST

  • Next message: Theo Veenker: "Re: UnicodeData.txt problem"

    Werner Lemberg asked:

    > UnicodeData.txt is, as far as I know, the central file describing the
    > properties of the Unicode characters. As such it is tightly bound to
    > the corresponding Unicode version, and I wonder why one of the most
    > important elements, namely a version tag, is missing from this file.
    > I consider this as a serious problem. Similarly, a copyright notice
    > together with a license should be included, even if it just points to
    > a URL holding the complete text.

    It is a legacy format issue. UnicodeData.txt was the very first
    of the data files defined for the Unicode Standard -- many years
    ago. And there are many existing processes that parse it exactly
    as is. To minimize the problems of compatibility going forward,
    its format has been frozen for a long time -- and that includes
    not adapting the comment and version conventions that the other
    data files have.

    The versions of all instances of UnicodeData.txt files are
    clear by context in the ftp://www.unicode.org/Public/ directories,
    so if you have a "loose" copy of UnicodeData.txt that you
    are unsure about its version, that can always be determined
    by comparing dates and sizes against the versions in the
    Unicode Character Database, or for absolute certainty, by
    diffing contents.

    >
    > I've only looked at version 4.1.0 -- maybe you've fixed this
    > meanwhile.

    No, this will not be changed in Unicode 5.0.0.

    --Ken



    This archive was generated by hypermail 2.1.5 : Thu Dec 08 2005 - 16:39:00 CST