Re: ~(char=glyph) (was Re: Normalization Form KC for Linux)

From: Erland Sommarskog (sommar@algonet.se)
Date: Tue Aug 31 1999 - 18:13:19 EDT


Edward Cherlin <edward.cherlin.sy.67@aya.yale.edu>
> Well, that's from German, since their orthography got ä and ö from ae and
> oe in the first place. Where did yours come from?

It is true that the <ä>-glyph originally comes from writing a small
<e> above an <a> and that equivalence is still maintained in the
mindset of German. However, it isn't in Swedish, although the
origin is the same. However, for Finnish <ä> is just a practical
glyph that was at hand when a written script for that language
was devised. Both in Swedish and German, <ä> does actually often
reflect an umlauted <a>, but not so in Finnish. (Hm, given the
vowel harmony of Finnish, that's really true, but let's ignore
that.)

Now, this is not a new concept. The same goes for <j>, <u> and <w> in
the English alphabet. They are derived from letters in the Latin
alphabet, and have achieved status on their own.

There have been some silly arguments in both sides of this debate.
Let's face it, there are no inherent properties of the letters
which determine which letters that can appear only precomposed,
and which can be decomposed. It is simply based on pragmatic and
historic reasons. To wit: the dominating language in the history
of computing has been English.

No matter whether I and Dan like it, we will have to face that <ä>
may appear as <a> + <¨>. Not because any stupid Anglophone thinks that
it is just an <a< with two dots over, but because there is a need
for a non-spacing diaresis anyway. Say that I one say get the
bright idea of devising a new, spelling-consistent script for
Swedish. Maybe I will find that <w> + <¨> is a character I need. With
Unicode I can do it. (I presume.)

And thus, we find that there is actually a letter in the English
langauge that could be a victim of decomposition, to wit <i>. In-
stead of having the precomposed letter, you might get a dotless
<i> + dot above. Whether this is legal, I don't know, but it would
be unlogical if it's not. (And for Dean A. Snyder who questioned the
need for a dotless <i>: you need to pick up some Turkish.)

Now, I don't know what is established practice, if there is any.
But in my opinion, the user should be saved from keeping track
of whether the <ä> he has in front of him is a precomposed letter or
an <a> + <¨>. Actually, I submit that the default mode for most tools
should be to be free to switch from the representation which fits
the tool best. A tool that prefers to work with precomposed
letters, could translate decomposed ones when reading input, and
may write it back that way.

--
Erland Sommarskog, Stockholm, sommar@algonet.se



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:51 EDT