Re: Support of U+00AD SOFT HYPHEN in current software

From: Karl Pentzlin (
Date: Sun Aug 16 2009 - 14:58:54 CDT

  • Next message: Robert Abel: "Re: Support of U+00AD SOFT HYPHEN in current software"

    Am Sonntag, 16. August 2009 um 20:37 schrieb Jukka K. Korpela:

    JKK> ...
    JKK> Since others (including me) have different experiences, there's something
    JKK> special in your test arrangements.
    JKK> ...

    Revisiting my test data, I was fooled at least partially by an error
    in my HTML editor, which I could reproduce now.

    I had used KomPozer 0.7.10 from
    (which I use as I did not find until now any lightweight HTML editor which
    processes non-Win1252 keyboard input correctly).
    I had prepared the test string (the four letters "test" + SHY) within
    BabelMap where I used the Copy button, and pasted the string
    several times in the HTML editor into an empty page, while UTF-8 was
    configured in the options as standard encoding.

    Nevertheless, I had created a page which contains a header containing:
     content="text/html; charset=UTF-8"
    but a binary 0xAD at the places where the UTF-8 encoding of the SHY
    had to occur.

    Of course, Microsoft Internet Explorer and Firefox cannot be blamed
    for finding no SHY in the such generated broken UTF-8 sequences.
    In fact, these programs show correctly encoded text containing SHY
    (like my original mail in the Unicode Mail Archives) correctly.

    Nevertheless, when I do the same in Word 2007 (hitting Copy in
    BabelPad and pasting it several times into an empty document), and then
    mark all text and change the font size, I get wrong line breaks as the
    attached picture show. When I store this text to an HTML page, the
    SHYs however appear correct.

    - Karl Pentzlin


    This archive was generated by hypermail 2.1.5 : Sun Aug 16 2009 - 15:01:39 CDT