Re: Umlaut and diaeresis

From: Lars Henrik Mathiesen (
Date: Tue Jun 22 1999 - 11:57:43 EDT

> From: Keld J|rn Simonsen <>
> Date: Tue, 22 Jun 1999 03:59:20 -0700 (PDT)

> Well, the aa is sorted as an å also in foreign words and names

This is however completely counterintuitive to any Dane who was not on
the relevant Danish Standards committee. The surnames Isak and Isaak
are unexpectedly two pages apart in the Copenhagen phone book --- and
there aren't many people who will think to look in both places.

> and it is recommended that a soft hyphen SHY then is introduced
> between the two a's.

Is that the SHY that some editors put in the buffer for display when a
word needs to broken, and remove again when the paragraph is reflowed?
Or the SHY that's permanent in the document, and means that the word
can only be broken at the place marked? I don't think you can rely on
Latin-1 SHY having suitable semantics for this, but perhaps Unicode
ZWNJ could be used.

> From: "Karl Pentzlin" <>
> Date: Tue, 22 Jun 1999 03:27:00 -0700 (PDT)

> Fortunately, nobody requests two different encodings of the ring above
> (U+030A) for that reason.

I know that one of the character sets for Danish library use had a
separate code point for digraph aa, so that Danish names using aa for
the å sound could be sorted together with the variants spelled with å
without mangling the sort order for foreign names. I don't know if it
was used consistently on input, though --- perhaps only when it made a
difference (like for Aagaard/Ågård).

(Sorry, no real reference --- I saw a listing of this charset during a
student job at Roskilde University Library some 20 years ago, but I'm
not even sure if it was the one they used themselves. They made paper
tapes on Flexowriters in DANMARC format (1975 version), but there does
not seem to be any online info on the charset used with that format).

Note: I'm emphatically not suggesting that this character should be
added to Unicode --- I'm sure all data using it are on tapes that have
rotted by now, or are converted to Latin-1 already. (And a conversion
to Unicode could use "a ZWJ a" to represent it if necessary).

Lars Mathiesen (U of Copenhagen CS Dep) <> (Humour NOT marked)

