RE: Orthographies using ZWNJ (was: Displaying control characters)

From: Philippe Verdy (
Date: Mon Jul 23 2007 - 22:51:57 CDT

  • Next message: Asmus Freytag: "Re: Orthographies using ZWNJ (was: Displaying control characters)"

    Asmus Freytag wrote:
    > Rest assured, the WJ would be quite incorrect. The fact that you keep
    > repeating this indicates that you did not read the standard or any of my
    > other posts.

    Rest assured that I read the standard and did not find any rationale about
    the use orsemantics of WJ compared to ZWNJ which was introduced only much
    later to replace the deprecated ZWNBSP used now as a BOM.

    I absolutely don't care about the linguisitic definitions of "syllables"
    bercause this cannot be treated at the encoding or local orthographic level
    without the help of some language-specific dictionary. These linguisitic
    syllables are NOT a property of the script with which these languages are
    written, and Unicode does not encode the languages, so it cannot treat them.
    However It's up to Unicode to define the way a script can be encoded to
    specify essential things like the prohibition or preference of ligatures, or
    the prohibition or suggested "syllable" breaks.

    Yes, English lacks a correct word for saying "syllable breaks", i.e. the
    fact that some places in a word can be used to split it to separate lines,
    possibly also adding some visible mark when this occurs. What I really mean
    by "syllable break" in ALL what I have written since now is what is meant by
    the much more precise French term "césure".

    You tried to use the terms "word breaking" but this term seems wrong too for
    this usage: for me word breaking is the fact of splitting a text into
    separate words, not the fact of finding possible breaks within a word.

    All your misunderstanding of what I meant (suggesting that I wanted to
    redefine things, which I am not) is caused by the misunderstanding of the
    English expression "syllable break". Read it as the French term "césure",
    which is much better than "syllable break" (even though no "césure" can
    occur in the middle of a linguistic syllable in French).

    And yes I know that a césure is *preferably* not used in every places (but
    absolutely NOT forbidden), for stylistic reasons (in French it is preferable
    to not insert a césure after the prefixes "con-", "cul-",... or in the
    middle of "coha-bite" for the same reasons that it would be read as

    I say "preferably", because there are frequent cases where this use is
    wanted by authors, notably in poestry and the texts of songs (where the
    césure is made audible by the rhythm or the melody), but also for the most
    vernacular use. Look at the French article about "césure" in Wikipédia,
    you'll find some external references about these funny césures used
    purposely in songs; the most wellknown cases in France being those from
    Serge Gainsbourg who was known to have an excellent mastership of the
    correct French language (despite his language was perceived as crude and
    shocking in the 1960's). I'm sure that such authors also exist in other
    international cultures, and that playing with the too strict commonly
    admitted language rules is wanted in every cultures, that don't want to
    restrict the language only to formal uses.

    So even if a language will preferably be not rendered with these generally
    undesired césures, or will preferably not leave a short syllable alone with
    just one or two letters for typographic reasons, these considerations are
    NOT considered incorrect for the language itself, where preferences of style
    is left as a choice by the author.

    Now let's get back to Unicode and text encodings: how can an author specify
    simultaneously in the text where ligatures can or cannot occur, and where
    césures can or cannot occur? And if it occurs, how must the cesures be
    presented (standard hyphenation with a hyphen mark at end of the first line,
    as implied by SHY, is not the only option, and even Latin-written languages
    have other requirements about how a césure should be presented).

    This archive was generated by hypermail 2.1.5 : Mon Jul 23 2007 - 22:54:06 CDT