RE: Latin ligatures and Unicode

From: Marco.Cimarosti@icl.com
Date: Mon Dec 27 1999 - 15:46:35 EST


Michael Everson wrote:

>I don't think so. The ZWJ does something else. It's a subtle difference,
>but it is a good one.

I think the details will come with your paper, so I wont ask you to do the
work twice. I hope it will be a public paper for all we outsiders to peep
at.

>>whenever a "c" is
>>*immediately* followed by an "h", a "ch" ligature should be used; whenever
a
>>"s" is *immediately* followed by a "t", a "<long s>t" ligature should be
>>used.
>>No ZWL is *necessary* in these cases: the ligatures are the "default" for
>>that font, so the users should simply have them or indicate otherwise
(using
>>ZWNL!).
>No, that's not right. That's not the way the current model we have for
>Brahmic scripts works either. You could do it this way, but it really
>doesn't fix the problems with Runic etc.

As and end-user, I don't want to loose my time to input any information that
the software can infer. It is a different matter when the software cannot
infer it, or when its inference does not comply with *my* rules. In those
cases I can afford to use a little bit of my time to fix it. When I write in
English (whose hyphenation rules are a mystery for me!), I want the software
to properly hyphenize for me. Only if, exceptionally, the hyphenation for a
specific word does not satisfy me (because, e.g., it is not in the
dictionary), I accept to use my time to add ad hoc hyphenation information.

Similarly, every system should use rules to produce ligatures, when these
rules exist, and only ask me to bother about it when these rules don't exist
or don't satisfy me.

I am not sure who "we" refers to... If you mean the Unicode Consortium, my
understanding of the Indic encoding must be completely wrong, because it
seems to me rather similar to what I described: I just add the viramas (i.e.
phonetic information!, that simply indicates that the inherent "a" vowel is
not to be pronounced) and the software also uses this information to
correctly choose ligatures or contextual glyphs. Only in exceptional cases I
need to add ZW*J controls to obtain special things (e.g. a half consonant in
isolation).

>>Call it what you like: "word boundary", "morpheme boundary", "lexeme
>>boundary": however, it is the position of this boundary that makes
>>"Wachs/tube" different from "Wach/stube", not the presence or the absence
of
>>a ligature in Fraktur.
>But it is the correct use of ligatures that makes the word unambiguous in
>Fraktur where it is not in Roman.

But text comes first! Font is just one *attribute* among many others: I
could change it in an Augenblick just by choosing a font name in a
combo-box.

Adding structural information (such as word boundaries) has a more general
usefulness in any kind of plain- or fancy text, for a wider range of
applications.

Adding purely graphical information as "ligate these two letters" only make
sense when this is really a purely graphical choice, with no other meaning.
I understood that your examples with the Runic scripts where of this kind:
"using or not a ligation has absolutely meaning here, but I want anyway to
show you exactly what the scribe wrote on that inscription, because this
information is relevant for us".

And indeed, when we are dealing with extinct languages, or with texts that
may possibly contain hidden messages, we cannot be totally sure that what
seems to be an arbitrary graphical choice isn't really a meaningful feature.
So it makes sense to have a device to encode the graphic difference, just to
be as literal as possible. And it makes sense to have it in plain text,
because a character set is a character set, not a word processor, and it
should not rely too much on font technologies... Who said that the primary
thing I want to do with my text is to display or print it, rather than, say,
store it in a database for doing a statistical research?

_ Marco



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:57 EDT