Re: unihan.txt

From: Kenneth Whistler (
Date: Wed May 07 2003 - 14:17:49 EDT

  • Next message: Michael \(michka\) Kaplan: "Re: variants and code-page --> unicode conversion"

    Raymond Mercier noted:

    > I assume from the remarks in
    > that no revision of
    > unihan.txt is proposed for the forthcoming publication of Unicode 4.0.

    That is correct.

    > That
    > would be a pity. When the Unicode standard is founded on consistent
    > scientific control, surely it would be a good idea to revise unihan.txt so
    > that it exhibits a similar spirit.

    The development of the Unicode Standard is essentially an *engineering*
    project, not a scientific one. The editors and the members of the
    Unicode Technical Committee try to apply the best quality assurance
    practices they can to that process, as new versions are rolled out.
    When the decisions were made to release Unicode 4.0, the numerous
    updates for Unihan.txt were not yet completed and fully checked, so
    the release of the update of that file has been delayed until they
    can be completed and checked.

    > Instead it shows an erratic and
    > whimsical character that frustrates all the good uses to which it might be
    > put.

    Well, I'll let the maintainers of Unihan.txt speak as to the whimsical
    character of the data file ;-), but your inquiry presupposes that
    the editors are sitting on their duffs waiting for some indication
    that the file should be updated. Nothing could be further from
    the truth. Massive lists of additions and corrections have already been
    made to Unihan.txt. Those *would* have been released with Unicode 4.0,
    had they been ready in time. They *will* be released, when they are

    > Even if the text of the new volume is now fixed, surely it is not too late
    > to revise a file that will only go into the CD.

    This unfortunately misunderstands the nature of the publication
    process. The Unicode Standard, Version 4.0 itself has already
    been finalized. It was released on April 17, and is defined
    on the website. The data files associated with that release
    are all final and frozen, and are available online.

    The *book* publication, 'The Unicode Standard, Version 4.0',
    published by Addison-Wesley, is a few months behind. It should
    be available in early September. When it is available, the
    CD-ROM in it will have the same data files as are already
    available online -- and they aren't going to be updated
    piecemeal in ways that would put them out of synch with the
    definitive data files online.

    The opportunity for the update of Unihan.txt is the *next*
    update release of the standard, likely to be Unicode 4.0.1,
    if past practice applies. That will merely be an update of
    the online data files, not accompanied by any further
    changes in the book.


    This archive was generated by hypermail 2.1.5 : Wed May 07 2003 - 15:23:18 EDT