Re: Translineating hyphens (was: "Re: Proposing a DOUBLE HYPHEN punctuation mark")

From: Asmus Freytag (
Date: Thu Jan 25 2007 - 12:28:47 CST

  • Next message: Jukka K. Korpela: "Re: Proposing a DOUBLE HYPHEN punctuation mark"

    Hmm. this looks like we need to address the sequence <SHY, HYPHEN> as
    well as perhaps rules on language-dependent interpretation of HYPHEN
    with or without SHY in UAX#14.

    The way the default algorithm works, getting a hyphen to not break when
    preceded by a SHY adds a certain amount of complexity (perhaps not if
    your engine is based on full regex support, but for other types of

    The best approach, in the context of UAX#14 might be to simply defer
    handling this to a stage right *after* the decision has been made to
    actually break a line a a line break opportunity given by a hyphen.

    Hyphen-based line break opportunities should be post-processed anyway,
    since a layout that avoids them when possible may be considered
    preferable. Not to mention examples like a -a or -b suffix, where you
    don't want to break before the single letter, even though on the
    character level it's legal.

    I think I'm going to add a section on Hyphen to the UAX#14 draft. This
    would be a good opportunity to flush out any other unusual behavior of
    that character.


    On 1/25/2007 6:26 AM, António Martins-Tuválkin wrote:
    > On 2007/1/23, Adam Twardoch <> wrote:
    >> On a related note: in Polish typesetting practice, hard
    >> hyphens are always promoted to the next line if soft
    >> hyphens occur in the text. So if I have a sentence "Tam
    >> wisi czerwono-niebieska flaga" and the optimal line
    >> break occurs where the hard hyphen already exists,
    >> the text will be hyphenated like this:
    >> Tam wisi czerwono-
    >> -niebieska flaga.
    > That's exactly what we do in Portuguese; and we do use a lot of
    > hyphens, which are mandatory for half the verbs forms including a
    > pronoun.
    > Skilled typesetters and wp users routinely type *each and every
    > hyphen* as a sequence of <soft hyphen> <hard hyphen>, which behave as
    > expected in MS Word, InDesign, PageMaker and QuarkExpress (at least).
    > The golden rule is «Never type a regular hyphen in Portuguese». Bolder
    > types (pun intended) apply this practice when typing other languages,
    > too.
    > Of course unskilled typesetters and wp users (which account for 99,9%
    > of everybody sitting in front of a keyboard) use regular hypens and
    > even resort to <hyphen> <space> <hyphen> to force the intended
    > behaviour, which come out very lame should the pargraph reflow — a
    > ususal sight even in newspapaers and books.
    > This is especially unfortunate since a homography and ambiguity may
    > arise: E.g., "_disparate_" means "folly" while "_dispara-te_" means
    > "fire yourself" (or "fires onto you"). The correct way to translineate
    > is:
    > «Lorem ipsum dolor sit amet, disparate consectetur adipisicing»
    > identical to
    > «Lorem ipsum dolor sit amet, dispara-
    > te consectetur adipisicing.»
    > And
    > «Lorem ipsum dolor sit amet, dispara-te consectetur adipisicing»
    > identical to
    > «Lorem ipsum dolor sit amet, dispara-
    > -te consectetur adipisicing.»
    > Use of regular hypen yields the same result for both originals,
    > leaving the reader to wonder wheather "_disparate_" or "_dispara-te_"
    > is intended. Should <hyphen> <space> <hyphen> be inserted in order to
    > force the expected behaviour, a paragraph reflow made later on will
    > result in:
    > «id est laborum. Lorem ipsum dolor sit amet,
    > dispara- -te consectetur adipisicing.»
    > P.S.: This is not anymore about «The "double hyphen" I discuss here
    > consists of two stacked dashes».
    > --
    > António Martins-Tuválkin
    > <antonio(a)>

    This archive was generated by hypermail 2.1.5 : Thu Jan 25 2007 - 12:31:25 CST