Re: CGJ , RLM

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Mon Nov 29 2004 - 11:52:15 CST

  • Next message: Asmus Freytag: "Re: CGJ , RLM"

    From: "Otto Stolz" <Otto.Stolz@uni-konstanz.de>
    > Note that there is no algorithm to reliably derive the position of the
    > syllable break from the spelling of a Word. You could even concoct pairs
    > of homographs that differ only in the position of the syllable break
    > (and, consequently, in their respective meaning). So far, I have only
    > found the somewhat silly example
    > - "Brief"+SYH+"lasche" (letter flap) vs.
    > - "Brie"+SYH+"flasche" (bottle to keep Brie cheese in),
    > but I am sure I could find better examples if I would try in earnest.

    French hyphenation does not work reliably based only on orthographic rules.
    It works wuite well, but with many exceptions, that require using an
    hyphenation dictionnary. I think it's true also of almost all alphabet-based
    languages, and even for some languages written with so-called "syllabic"
    scripts, probably as a matter of style, where separate vocal syllables must
    not be broken, as those breaks are not the best according to meaning
    (notably for compound words).

    The case of German is that there are many possible compound words, and
    breaks preferably occur between radical words rather than between syllables,
    with exceptions:
    - due to other stylistic constraints, or
    - on short particles that should better not be detached from their
    respective radical (but where do you best break the "hereinzugehen" or
    simply "zugehen" verbs?),
    - also because not all verb particles are detachable, as they belong to the
    radical (many excamples with the "be" particle or radical prefix)

    Even if you allow hyphenation only between lexical units, there will exist
    some exceptions that can't be resolved without understanding the semantic.
    Such compound words with no separator are extremely rare in English, and
    very rare in French.

    (French examples: there's a clear vocal syllable break in "millionce" after
    "-li-" and before "-on-" prononced with separate vowels, but in "million",
    no break occurs within "-lions" which is a single syllable, pronounced with
    a diphtong; none of these examples are compound words.)

    But hyphenation is still preferable in German than only word breaks (on
    spaces), due to the average length of compound words, whose margin alignment
    may look ugly and hard to read in narrow columns like in newspapers or in
    dictionnaries. In Dutch, there's more freedom for the creation of compounds,
    that can often be written with or without a separator (a modern Dutch style
    prefers using separators, or not creating any compound, by using word
    separation with space, but historically Dutch was using the German style
    still in use today despite its possible semantic ambiguities).

    I think that a German writer that sees a possible ambiguity will often
    tolerate to use an unconditional hyphen to create compound words (in your
    example, he would write "Brief-Lasche" or "Brie-Flasche" but not
    "Brieflasche" whose interpretation is problematic because there's no easy
    way to determine it even with the funny semantic of the two alternatives;
    unless the author is sure that ligatures are correctly handled with a
    ligature on "fl" for the interpretation as "Brie-Flasche", and no ligature,
    and a narrow spacing, between f and l for the interpretation as
    "Brief-Lasche").

    (Historically, German texts were full of ligatures -- much more often than
    in other Latin-based written languages -- those ligatures tending now to
    disappear from most modern publications; with the German rule that a
    ligature should not occur between two syllables, and should be present
    within the same radical, it's easy to see how ligatures are part of the
    orthographic system and that they have a semantic value which helps the
    correct understanding of text, so it would be even more important to use
    ZWNJ or ZWJ in German words, and not letting a renderer do this job
    automatically but inaccurately; for simplicity, I think that ZWNJ inserted
    between radicals to avoid their ligature would be easier to manage than ZWJ
    between two ligaturable letters that must be kept in the same syllable).



    This archive was generated by hypermail 2.1.5 : Mon Nov 29 2004 - 14:44:26 CST