Re: UnicodeData.txt problem

From: Doug Ewell (
Date: Sat Dec 10 2005 - 10:09:45 CST

    Richard Wordingham <richard dot wordingham at ntlworld dot com> wrote:

    > The complaint wasn't about the format; it was about the content, i.e.
    > the lack of any version marking. The only of doing that would be a
    > hack like abusing a Unicode_1_Name. However, that could break an
    > application that knows how many characters there were in Unicode 1.0,
    > as could supplying a spurious Unicode_1_Name for a character outside
    > the BMP. (My first though for such a hack was to put copyright
    > information in the Unicode_1_Name for U+10FFFD or U+10000.)

    When Ken and others say they can't add comments or a version record to
    UnicodeData.txt because doing so would break existing parsers, that
    means the file format does not provide for comments or a version record.
    Support (or lack of support) for comments is just as much a part of the
    file format as whether this field comes before that one.

    When I talk about converting the file you get into the file you want, it
    could be something as simple as adding a comment line containing the
    version number, or adding copyright information to an existing record
    where your apps know to treat it specially. Or it could mean converting
    the file to XML or DBF, or inventing an entirely new format. The
    important thing is the decision to convert the file UTC gives you and
    use the converted version.

    In the case of adding version information to a file that doesn't already
    have it, it is a matter of manually supplying the information, based on
    personal knowledge or file size or whatever. The advantage is that you
    only have to do it once, and don't have to embed the logic into your

    Doug Ewell
    Fullerton, California, USA

