Re: unicode Digest V12 #108

From: Jukka K. Korpela <>
Date: Sun, 3 Jul 2011 21:55:29 +0300

Philippe Verdy wrote:

> 2011/7/2 Jukka K. Korpela <>:
>> And there is really no guarantee that programs support the soft
>> hyphen. For one, Microsoft Word doesn’t—it treats it as just another
>> printable character.
> You're wrong, it DOES. I just tested it (in Microsoft Word 2010 for
> Windows 7) within a random long word (aaaaaaaaaa....) and the SHY is
> recognized to generate the intended hyphenation break.

That’s good news, if your analysis is correct, but the problem still exists
in all Word versions up and including Word 2007.

> Regarfing the previous comment about the Danish "aa", given that
> Danish normal orthography uses å now for all cases where a legacy "aa"
> digram would have been used,

The assumption is incorrect, as “aa” is still used in proper names as per
the old orthography. But I don’t see how we could (and whether we should)
solve the problem at the character level. When people write “Aalborg,” it
might be appropriate to treat it as if spelled “Ålborg” for the purposes of
searching, sorting, etc., but wouldn’t it be better to handle that above the
character level than by introducing invisible control characters? Such
control characters, especially if they are relatively new, can easily create
bigger problems than those that they are supposed to solve. They might look
ideal from some narrow perspective, but considering all the possible ways
that texts might get processed, they are risky.

(For some time ago, I started using soft hyphens on my web pages. While they
work pretty well nowadays, considering web browsing as such, it is somewhat
embarrassing to see my texts quoted, when copy and paste has resulted e.g.
in the replacement of hyphen-minus or space for any soft hyphen. I guess the
risk is still worth taking, as the benefits in normal usage outweigh the
problems. But it would be a different matter to use invisible control
characters without tangible benefits and reasonable expectations on them.)

Received on Sun Jul 03 2011 - 13:57:02 CDT

This archive was generated by hypermail 2.2.0 : Sun Jul 03 2011 - 13:57:03 CDT