> I have a question about the file
> <http://www.unicode.org/Public/UNIDATA/Scripts.txt>, the data file for
> UTR#24 (Script Names).
> I see that script-specific combining characters are normally assigned to
> that script. However, a few of them are in the INHERITED class:
> Are these characters used in more than one script? If not, what is the
> reason for having them INHERITED?
My understanding is that the author (Mark Davis) was basically
trying to be conservative in the assignments. Having the script
be INHERITED for these combining marks won't pose any difficulties
for text in the context that Scripts.txt is supposed to be used
(e.g. for regular expression syntax). Assigning one of these to
a particular script and then later discovering that it is used in
more than one and thus should be INHERITED is more of a problem than
the other way round.
UTR #24 should make clear that these script assignments are not
notionally the same as a determination of the historic status of
a particular mark in relation to a script or writing system. (The
*names* of the characters are better for that, actually.) Instead,
the assignments of Scripts.txt are a better cutting of the space
for the kinds of things people want to do in regex search patterns,
in an attempt to get people to stop using the awful stopgap of
Unicode block boundaries for the same purpose.
> Thanks for any info.
> _ Marco
This archive was generated by hypermail 2.1.2 : Fri Jul 06 2001 - 00:18:17 EDT