Re: Normalization Form KC for Linux

From: Michael Everson ([email protected])
Date: Sun Aug 29 1999 - 07:36:28 EDT


Ar 11:03 +0200 1999-08-29, scr�obh Dan Oscarsson:

>> >For example:
>> >- having non spacing combining characters after instead of
>> >before base character.
>>
>> I understood that it's much better to have them after.
>I have not yet located why. I can see ways were software can
>handle them much easier if they comes before.

I think it's a question of logic. One writes an o and then puts a ~ on top
of it. On typewriters, they switched this around because of the mechanical
movement of the carriage and nonspacing technology. But for instance a
smart font would take the bounding box

>Well the Swedish alphabet includes � but not �. The last is an
>"a with an accent above", while � is a letter in itself.
>Just because the glyph looks like an "a with two dots above", it is not.
>It is wrong to decompose a character just because its glyph looks
>like being composed of an accent and an other character.

So what? Your answer shows exactly what I said: that � and � differ only in
the minds of Swedes.

1. Both are used in Swedish texts (2 biljetter � 40 kronor).

2. Both can be represented in the UCS in two ways: � by 00E4 or 0065 +
0308, and � by 00E0 or 0065 + 0308.

3. � is sorted as a variant of a with a diacritical mark (p. 1 of Norstedts
svensk-engelska ordbok 1994).

4. � is sorted as a separate letter of the alphabet (p. 855 of Norstedts
svensk-engelska ordbok 1994).

Technically, there is no difference except in Swedish implementation.

--
Michael Everson * Everson Gunn Teoranta * http://www.indigo.ie/egt
15 Port Chaeimhghein �ochtarach; Baile �tha Cliath 2; �ire/Ireland
Guth�n: +353 1 478 2597 ** Facsa: +353 1 478 2597 (by arrangement)
27 P�irc an Fh�ithlinn;  Baile an Bh�thair;  Co. �tha Cliath; �ire



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:51 EDT