Re: Unicode Digest, Vol 56, Issue 20

From: Doug Ewell via Unicode <unicode_at_unicode.org>
Date: Thu, 30 Aug 2018 12:27:30 -0600

UnicodeData.txt was devised long before any of the other UCD data files. Though it might seem like a simple enhancement to us, adding a header block, or even a single line, would break a lot of existing processes that were built long ago to parse this file.
So Unicode can't add a header to this file, and that is the reason the format can never be changed (e.g. with more columns). That is why new files keep getting created instead.
The XML format could indeed be expanded with more attributes and more subsections. Any process that can parse XML can handle unknown stuff like this without misinterpreting the stuff it does know.
That's why the only two reasonable options for getting UCD data are to read all the tab- and semicolon-delimited files, and be ready for new files, or just read the XML. Asking for changes to existing UCD file formats is kind of a non-starter, given these two alternatives.

--Doug Ewell | Thornton, CO, US | ewellic.org
-------- Original message --------Message: 3Date: Thu, 30 Aug 2018 02:27:33 +0200 (CEST)
From: Marcel Schneider via Unicode <unicode_at_unicode.org>

Curiously, UnicodeData.txt is lacking the header line. That makes it unflexible.
I never wondered why the header line is missing, probably because compared
to the other UCD files, the file looks really odd without a file header showing
at least the version number and datestamp. It?s like the file was made up for
dumb parsers unable to handle comment delimiters, and never to be upgraded
to do so.

But I like the format, and that?s why at some point I submitted feedback asking
for an extension. [...]
Received on Thu Aug 30 2018 - 13:28:03 CDT

This archive was generated by hypermail 2.2.0 : Thu Aug 30 2018 - 13:28:04 CDT