Re: diaeresis/umlaut

From: John Cowan (
Date: Sat Jun 12 1999 - 21:20:47 EDT scripsit:

> >I think there are other non-spacing characters (diacritics) that have the same
> > Unicode character code value but different meanings in different scripts. And
> > like Mr. Figge I begin to wonder why these two meanings are not treated
> > differently, like Latin A, Greek Alpha and Cyrillic A have different code
> > values. Maybe someone can clarify this.
> I believe the main reason that these were kept separate is for round-trip
> convertibility with existing standards.

There's another reason: the search problem. If you search a multilingual
document for "ABC" you do not want Cyrillic A-Ve-Es being found too.
It's quite bad enough that Fullwidth-A-B-C will be missed by a naive
search algorithm, but at least those are compatibility equivalents.

John Cowan
		e'osai ko sarji la lojban.

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:46 EDT