Jonathan Rosenne wrote:
> >Is there a minimal pair in Hebrew that shows that KAF/FINAL KAF are
> >different letters?
> What do you mean? No one denies that they are different shapes of the same
> letter, but we say that you have to encode them at source because the shape
> cannot be determined algorithmically in a practical way.
Thanks, I was hoping you'd say that :-)
> your arguments are correct but are barely relevant and misleading.
On the contrary, I'm very grateful for Arno's details, and they are
certainly relevant to my argument.
Let's draw some conclusions:
(1) The situation of LONG-S versus ROUND-S is quite parallel to that
between FINAL-KAF and KAF. In both cases we clearly see the same
letter, but in different shapes. In a sense, they are
glyph-variants---but chosing the wrong one would be considered
In both cases, it is in principle possible to make the decision by
using a large dictionary (syllable segmentation algorithms as in
TeX use hyphenation rules obtained from one), but clearly that is
not a feasible solution for, say, a simple mail reading program.
(But it would be feasible, for instance, if a German publisher
wants to print a book in Fraktur or Suetterlin from an electronic
manuscript that doesn't distinguish round-s and long-s.)
(2) Therefore, FINAL-KAF and LONG-S need to be encoded. Not, as has
been hinted, because they come from an ancient legacy encoding,
but because they are necessary, here and now.
(3) There still remains the question why LONG-S has a compatibility
decomposition to S, while FINAL-KAF doesn't. I'm not sure what
the consequences of this mapping are, but one theory would be this:
When you search for a string in a word-processor, I would like "s"
to match all of "s", "S", and "long-s". How is this in Hebrew?
Would you want to find a match with FINAL-KAF if you typed a KAF
in the search pattern?
(4) The long philosophical discussions on "What is a letter?" "What is
a character?" "Who am I?" may be fun, but have little impact on
the practice of Unicode.
The distinction between glyph variants that do not need to be
encoded, glyph variants that need to be encoded, and genuinely
different letters, is one of locale and time. Principles will
count much less than how those `symbols' are used in the various
scripts that they are used for.
Which `symbols' have been encoded in Unicode with separate code
points and where seems to be more a function of previous encodings,
of national sensitivities, and national pride, than of any firm
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:54 EDT