Re: Fixing Two Unicode Asymmetries in case conversion

From: AddisonP (AddisonP@simultrans.com)
Date: Fri Nov 13 1998 - 12:32:17 EST


That "old German" characters (weren't these called "Fraktur" characters??) and
"old English" (actually not "old English", because that would be the language
of Beowulf...) are indeed related, in part because of German influence on type
design.

It seems that we are reiterating this issue quite frequently on this list. The
dotless-i/sharp-s problem exists because natural language is a human (and
therefore highly-complex) system. It is hard to write a single
toupper()/tolower() function set for any character set other than ASCII and
creating "disambiguous" pairs IS attractive for that programmatic purpose.
Casing and sorting are difficult enough problems in an increasingly
multi-lingual world...

On the other hand, I thought that one of the base rules of Unicode is/was "a
difference that makes no difference is no difference" and that we are encoding
individual unique characters and not glyphs. Since it appears that Unicode
contains all of the requisite individual characters for these combinations,
doesn't that mean we should now all focus on implementing them correctly in
each locale (a programming task, not a standards task)? Otherwise the growing
list of visually and functionally equivalent characters that have to be
processed will grow amazingly, causing a different set of implementation
problems altogether.

AP

----------------------------
Addison Phillips
Director, Technology
SimulTrans, LLC

AddisonP@simultrans.com
+1 650 526-4652

"22 languages. One release date."
----------------------------

schererm@us.ibm.com wrote:

> another note on the sharp s:
>
> the sharp s (
> Ar 05:10 -0800 1998-11-13, scr
> ß) was born as a ligature from two different-looking (but
> same-sounding) s-letters. some german "legend" mistakenly says that it is
> an sz-ligature, but that is wrong despite similarities in looks. old
> english writing apparently also had these two s-letters, as i found in a
> facsimile version of the "bill of rights". in german writing, this
> disappeared with the ban (1941, by hitler) of the old german script
> (hand-written as well as printed).
>
> so, on one hand, this is one of the ligatures that may be expanded to two
> letters for uppercasing or for other reasons.
>
> on the other hand, to expand it even in lowercase and to preserve old
> german (and english) spelling, there is indeed a second s-letter in
> Unicode, latin small letter long s, U+017f. software dealing with this may
> have to be sensitive to languages like "old german" and "old english" or to
> an attribute of "old style european". the sharp s U+00df should become
> U+017f U+0073. the long s is the not-at-the-end-of-a-syllable-s. also, all
> other s-letters that are not at the end of a syllable would have to be
> converted to U+017f.
>
> i remember at least one modern example of where both s-letters are used: in
> headlines in the big german daily newspaper "Frankfurter Allgemeine
> Zeitung" ("FAZ").
>
> the second s-letter does not have a distinct uppercase equivalent. similar
> to what was discussed here earlier, one may want to make a case for an
> uppercase version that looks just like a normal S, but
> orthography-sensitive software should not need that.
>
> to convert text from uppercase or non-ligatured lowercase/normalcase with
> double-normal-s to normalcase with the sharp s ligature ß, software could
> be language- or style-sensitive and detect the ends of syllables (using
> hyphenation rules) where there are at least two s letters next to each
> other.
>
> for example, "Fussball" has syllables "Fuss" and "ball" and would be
> converted to "Fußball", while "müssen" has syllables "müs" and "sen" and
> does not change.
>
> part of the recent change in the german orthography rules is to do away
> with the sharp s letter in most, not all, cases and to replace it with two
> normal s letters, but this change is still debated, especially in the state
> of Schleswig-Holstein (the states have authority over education and
> culture, not the federation). actually, looking at the details, the new
> rule may even be more difficult for software that converts from two
> s-letters to the sharp s, because hyphenation rules alone won't do any
> more.
>
> http://www.dwelle.de/dw/rechtschreibreform/beispiele/ssss.html
>
>
> mixed news, i guess,
>
> markus (another "native speaker of german" :-)
>
> Markus Scherer IBM RTP +1 919 486 1135 Dept. Fax +1 919 254 6430
> scherer@raleigh.ibm.com
>
> ------------------------------------------------------------------------
>
> Part 1.2 Type: application/ms-tnef
> Encoding: base64



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:43 EDT