Re: diaeresis/umlaut

From: Asmus Freytag (asmusf@ix.netcom.com)
Date: Mon Jun 14 1999 - 23:57:09 EDT


There have been two submissions recently on this list on what constitute
reasons for disunification of characters.

A)

The reason is quite simple, they look the same, they have almost always been
treated the same, and most people will not be aware of their differences,
or don't care, so they will use one for the other, and effectively, we will
just end up with two characters that we will have to treat the same way.

B)
They must
- look reasonably the same (if more than one appearance, then both are never
needed in a given context; e.g. Greek & Coptic, CJK),
- never be distinguished in existing encoding standards
- never have different behaviour that requires different treatment by any
process

Here's my synthesis:

Characters should be unified if most of the following hold

--They look the same -AND-
--Alternate appearances can be distinguished by context
--They have almost always been treated the same
--Most people will not be aware of their differences, or don't care, so
they will use one for the other
--No major existing character set in which they exist distinguishes them
--They don't have different behaviour that requires different treatment by
  most processes

Even with such a set of rules, the decision will have to be made in a case
by case basis. This is not the first time such rules have been written
down, btw.
A lot of this information can be found in the various guidelines to
submitterss of new character proposals.

Finally, once a character is in Unicode, has been used widely, etc. it's
pretty near impossible to find a case where 'splitting' it has a result
where the benefit outweighs the costs. For heavily used characters, one can
expect that most implementations by now have found ways to deal with any
ambiguities that exist, and changing the encoding will invalidate them
needlessly.

A./



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:46 EDT