Re: unicode Digest V12 #108

From: Jukka K. Korpela <jkorpela_at_cs.tut.fi>
Date: Sat, 2 Jul 2011 21:56:00 +0300

Asmus Freytag wrote:
> On 7/2/2011 8:59 AM, Philippe Verdy wrote:
[...]
>> Why not simply using a soft hyphen between "n" and "g" in this case ?
>> Soft hyphens are normally recognized as such by smart correctors and
>> as well by search engines or collators. It seems enough for me to
>> indicate that this is not the Welsh digram "ng" ; CGJ anyway is
>> certainly not the correct disjoiner in your case.
>
> This solution works well if the word can split between the n and the
> g.

It would still be hackery, since word division is something different from
digraphs, no matter what one really means by “digraph.” It’s a trick
comparable to inserting a left-to-right mark in the hope of making relevant
software treat the characters before and after it as separate, not as
candidates for being treated as a digraph.

A soft hyphen probably has the desired effect, but its meaning is really
something different. In practice, it does not just say that there is a
possible word division point. It may also affect hyphenation so that no
automatic hyphenation is applied in the word at all, or within some distance
from the soft hyphen.

An isolated soft hyphen also introduces a line breaking opportunity that is
often pragmatically odd. In a context where no hyphenation is normally
performed, as on a web page, just throwing in a soft hyphen often causes
that very word to be split, in the midst of otherwise unhyphenated text.
Moreover, in such a situation, the word will be split at the soft hyphen, no
matter how odd such a particular division may look like, in a word with many
word division opportunities.

The morale is: Don’t play with the soft hyphen unless you are prepared to
address word division problem as a whole and at least check that the soft
hyphen you introduce is the optimal division point for the word or you can
assure that better division points will be used when applicable.

> The Danish digraph "aa", normally spelled "å" in modern orthography,
> but retained in names etc. can occur "accidentally" in compound
> nouns, such as "dataanalyse". Adding a SHY is the preferred method to
> indicate that the "aa" is accidental.

While the point is the optimal division point (between components of a
compound) in this case, this is not generally true for the possible use
causes. Besides, it may prevent other divisions of the word, which might be
applied due to automatic hyphenation and might really be needed for good
typography.

And there is really no guarantee that programs support the soft hyphen. For
one, Microsoft Word doesn’t—it treats it as just another printable
character. Software that recognizes “words” in some sense, e.g. search
engines, may or may not treat the soft hyphen as ignorable, so they treat
the word with a soft hyphen as two words. And so on.

We may need to take our chances when we really need discretionary
hyphenation hints. But why take those risks when you don’t really want to
affect hyphenation at all?

I may have missed some parts of the discussion, but I don’t see why you
couldn’t just use the zero-width non-joiner. Using it may cause risks of its
own, but at least you would be dealing with risks related to the original
problem.

Jukka
Received on Sat Jul 02 2011 - 13:59:47 CDT

This archive was generated by hypermail 2.2.0 : Sat Jul 02 2011 - 13:59:48 CDT