RE: Hyphen

From: Phillips, Addison (
Date: Sat Jan 17 2009 - 12:32:09 CST

  • Next message: Martin v. Löwis: "UCD.html and simple titlecase"


    I just had to try it. Actually, Word will break after any hyphen, including U+2011 (the non-breaking hyphen), at least on my setup. I didn't test it all that thoroughly, but I was surprised to see the results.

    As Jukka notes, UAX#14 does make it very clear that language-specific hyphenation (and some other break) handling is not covered by the UAX but is important in proper handling. Section 5.3 gives some good examples. My first thought was that Word was probably implementing language specific break handling and this was just a "symptom" of it. I'm sure Those Responsible might have a more definitive answer at some point.


    Addison Phillips
    Globalization Architect -- Lab126

    Internationalization is not a feature.
    It is an architecture.

    > -----Original Message-----
    > From: []
    > On Behalf Of Jukka K. Korpela
    > Sent: Saturday, January 17, 2009 2:55 AM
    > To: unicode
    > Subject: Re: Hyphen
    > Russ Stygall wrote:
    > > Microsoft Word12
    > What’s that? I suppose you mean Microsoft Office Word 2007, for
    > which the
    > name “Word 12” is sometimes used, as if Microsoft’s own numbering
    > schemes
    > weren’t confusing enough.
    > > does not agree with UAX#14
    > It’s very far from applying UAX #14. It doesn’t even use Unicode
    > characters
    > for optional and nonbreaking hyphen but its own special codes.
    > It seems to treat U+2010 HYPHEN as a special character with no
    > special line
    > breaking features (no automatic line break after it).
    > > unfortunately!
    > I don’t think any software should implement UAX #14 as such, except
    > programs
    > specifically designed to test the effects of UAX #14. It is absurd,
    > for
    > example, to break the expression “-1” after the HYPHEN-MINUS
    > character.
    > Line breaking in word processors should primarily work by the rules
    > of the
    > language of the text—and use UAX #14 just in special cases, such as
    > breaking
    > a string containing special characters when needed (but not without
    > discretion–for example, I don’t think it’s ever acceptable to break
    > “/usr/var/spool/foobar” after the first occurrence of SOLIDUS.
    > --
    > Yucca,

    This archive was generated by hypermail 2.1.5 : Sat Jan 17 2009 - 12:33:43 CST