~(char=glyph) (was Re: Normalization Form KC for Linux)

From: Edward Cherlin (edward.cherlin.sy.67@aya.yale.edu)
Date: Sat Aug 28 1999 - 20:05:35 EDT

At 05:15 -0700 8/28/1999, Paul Keinänen wrote:
>At 03:31 28.8.1999 -0700, Michael Everson napisal:
>>Ar 01:54 -0700 1999-08-28, scríobh Dan:
>>>And if you work with linguistics, an ä cannot be decomposed when you
>>>work with Swedish, as it is a single letter. The dots above are not an
>>>accent or diacritic mark. So here is a case where you need to
>>>be able to represent what looks like the same glyph "an a with
>>>two dots above", both as one character and as an a with combining dots.

Paul is talking about formatted, tagged text, not plain text, so his
objection has nothing whatsoever to do with Unicode. Linguistics does not
deal with computer characters as its base data, but primarily with speech
and secondarily with letters of an alphabet, for which user nations have
the right to prescribe what rules they will. Just remember that those rules
apply to letters used in writing a particular language, not to characters
in a code set or glyphs in a font or even letters of any alphabet when used
to write in any other language.

>>Uh, you mean that it can't be displayed as a¨, right?
>Of course you can have a _glyph_ encoding in which the glyph ä is
>represented by a¨, but I guess that most people in Sweden or Finland would
>have problems deciphering it.

Merkins[1], too. Indeed, we do mean not only that we have different rules
for glyphs and characters, but that we do not consider whether people can
read internal data structures or file formats without special knowledge and
training. Producing a readable display is the function of rendering
software and graphics hardware, and of nothing else.

>This is apparently some Anglo-Saxon tradition,

Us Anglos don't use them funny letters. You can't pin that rap on us. ;-)

>as is the translitteration of ä ->ae or ö->oe, which is never used here (a
>and o are used instead if ä and ö are not available).

Well, that's from German, since their orthography got ä and ö from ae and
oe in the first place. Where did yours come from?

>Paul Keinänen

[1] Colloquial British for "American", as in the character of U.S.
President Merkin Muffley in the movie Dr. Strangelove (played by Peter
Sellers, along with two other characters, one British, one German).

Edward Cherlin   edward.cherlin.sy.67@aya.yale.edu
"It isn't what you don't know that hurts you, it's
what you know that ain't so."--Mark Twain, or else
some other prominent 19th century humorist and wit

