From: Theo Veenker (Theo.Veenker@let.uu.nl)
Date: Fri Dec 09 2005 - 02:32:48 CST
Kenneth Whistler wrote:
> Werner Lemberg asked:
>
>
>>UnicodeData.txt is, as far as I know, the central file describing the
>>properties of the Unicode characters. As such it is tightly bound to
>>the corresponding Unicode version, and I wonder why one of the most
>>important elements, namely a version tag, is missing from this file.
>>I consider this as a serious problem. Similarly, a copyright notice
>>together with a license should be included, even if it just points to
>>a URL holding the complete text.
>
>
> It is a legacy format issue. UnicodeData.txt was the very first
> of the data files defined for the Unicode Standard -- many years
> ago. And there are many existing processes that parse it exactly
> as is. To minimize the problems of compatibility going forward,
> its format has been frozen for a long time -- and that includes
> not adapting the comment and version conventions that the other
> data files have.
What about asking the users (i.e. developers) whether they'd like to
see a redesign of the UCD data files. I find the current structure a
real PITA. Why not simply create one data file for each property
and in the header of each data file a description of that property.
I vote YES.
You could even create a double set of data files: a new reorganized
set of data files, and a set for backwards compatibility (extracted
from the new set).
You're trying to minimize the amount of work developers have to go
through when they decide to upgrade their software to a new UCD version
and that is good thing. But IMHO I think holding on to the legacy
actually creates more work rather than less. Suppose a new binary
property Pattern_Filename (whatever) is invented and data added to
PropList.txt. Now I'm not interested in using the new property in my
software, but I still need to adapt my PropList.txt parser in order
to cope with the added property. If on the other hand the data for
the new property had been put in a new property specific file, I
wouldn't have to change my code at all! Also parsers would be much
simpler and it would therefore be easier to add a new parser for
a new property. One could even create parsers mechanically.
If we look at the future, say in ten or twenty years time, do you or
the Unicode organization believe the UCD data files will still be
excactly structured/formatted as they are now?
Best regards,
Theo
This archive was generated by hypermail 2.1.5 : Fri Dec 09 2005 - 04:35:15 CST