Cursor-Movement was: German Umlaut and other precomposed characters

From: Peter R. Mueller-Roemer (
Date: Tue Apr 26 2005 - 08:38:52 CST

  • Next message: Dean Snyder: "Re: String name and Character Name"

    Several precomposed letters are necessary for easy typing in German,
    Skaninavian, Greek, Hebrew ... see the different keyboard-layouts.
    Unicode was good in intoducing combining diacritical marks, so that you
    don't necessarily have to use a different keyboard for just entering a
    few words in another language.
    But why do all the precomposed Hebrew dagesh-consonants refuse to be
    composed with vowel-points AND cantilation marks - You can' even copy
    the first word of the Masoretic bible!
    What we need though is good editors that can not only compose characters
    with several diacritical marks - without overtyping (e.g. accute + grave
    should not merge to a smugy little x ), but find and replace the
    sequence by a single precomposed letter. With the result that composed
    sequences are counted for cursor movement as single characters. There
    should also be an easy way (e.g. by Alt Gr arrow) to enter into any
    precomposed letter to insert or delete any marks.
    Unicode might not like to address standardization of cursor-movement in
    multi-lingual texts with RtoL and LtoR entry, shaping and editing of
    combining sequences.
    Leaving it to the editor-providers individually will cause head-akes to
    those who have to use various OS and SW on several computers.
    There should be a technical committee of concerned parties to provide at
    least guidelines, for shaping, editing, navigating over and in combining
    sequences and in bi-directional texts. the present state is very

    Peter R. Mueller-Roemer

    Hans Aberg wrote:

    > At 15:48 +0200 2005/04/25, Otto Stolz wrote:
    >> you have written:
    >>> The Swedish language symbol (a with two dots above) is a separate
    >>> letter, not to be viewed as an alteration of the letter a. So it is
    >>> atomic. It is reasonable to enter it as a separate character. In
    >>> German, however it is an umlaut, alteration of the letter a.
    >> Not quite so: It has its own phonetic value (almost equal to its
    >> Swedish sibling, IIRC), and is taugh as seperate character in schools
    >> (believe me, I am German and interested in linguistic issues, and my
    >> wife is a teacher at an elementary school).
    >> The term "Umlaut" for a class of characters does not render these
    >> umlauts as non-characters. There is a similar term, "Ablaut", e. g.
    >> for the "a" and "o" in "barst" and "geborsten" (from "bersten") --
    >> yet, this does not qualify "a" and "o" as non-characters, alterations
    >> of "e".
    > Let's take it easy: I attempted to make a formal definition of the
    > notion of an abstract character, not to be confused with the many
    > possible intuitive notions of a character. When defining an abstract
    > character, I suggested that it should be a linguistic semantic unit
    > that in some sense or another is atomic. There, the point is that
    > symbols like can be atomized in more ways than one: It could be
    > viewed as a whole, indivisible unit, or a composite of more than one
    > characters. The choice may depend on the context.
    > The second point, though, is that the preference for larger symbols be
    > viewed as a single character, as regards to computer software,
    > probably is due to limitations of this computer software. It would
    > probably be better, computer implementationwise, to always represent
    > symbols like as a combination of smaller, abstract characters, as a
    > sufficiently smart computer program always can recognize the Swedish
    > or German letter , and give it the proper handling, and as we now
    > know that the representing of characters in a single or a bibyte will
    > not suffice anyhow.

    This archive was generated by hypermail 2.1.5 : Tue Apr 26 2005 - 08:40:12 CST