Re: Mixed up priorities

From: John Cowan (cowan@locke.ccil.org)
Date: Mon Oct 25 1999 - 12:54:08 EDT


Otfried Cheong scripsit:

> (1) The situation of LONG-S versus ROUND-S is quite parallel to that
> between FINAL-KAF and KAF.

Indeed.

> (2) Therefore, FINAL-KAF and LONG-S need to be encoded. Not, as has
> been hinted, because they come from an ancient legacy encoding,
> but because they are necessary, here and now.

For both reasons.

> (3) There still remains the question why LONG-S has a compatibility
> decomposition to S, while FINAL-KAF doesn't.

This involves a subtle point (meaning that I myself only figured it out
a short while ago :-) ). To give a character a compatibility decomposition
asserts more than that it is a variant form of one or more other
characters. It further asserts that the character is itself a
compatibility character: i.e. it was encoded solely for compatibility
with something, typically another encoding.

Thus (until I bitched about it recently), ASCII ^ had a compatibility
decomposition of SPACE followed by COMBINING CIRCUMFLEX. This was
a blunder, simply because ASCII ^ is not a compatibility character;
it has acquired many functions of its own, particularly in computer
languages. (The specific problem was that ASCII files would cease
to be pre-normalized in Normalization Forms KC and KD.)

> When you search for a string in a word-processor, I would like "s"
> to match all of "s", "S", and "long-s". How is this in Hebrew?
> Would you want to find a match with FINAL-KAF if you typed a KAF
> in the search pattern?

Plain-text search in Hebrew isn't very useful, because of the
overlapping morpheme structure.

-- 
John Cowan                                   cowan@ccil.org
       I am a member of a civilization. --David Brin



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:54 EDT