Re: unicode Digest V12 #108

From: Philippe Verdy <>
Date: Sun, 3 Jul 2011 15:31:41 +0200

2011/7/2 Jukka K. Korpela <>:
> And there is really no guarantee that programs support the soft hyphen. For
> one, Microsoft Word doesn’t—it treats it as just another printable
> character.

You're wrong, it DOES. I just tested it (in Microsoft Word 2010 for
Windows 7) within a random long word (aaaaaaaaaa....) and the SHY is
recognized to generate the intended hyphenation break.

And SHY does not invalidate the spell corrector in a long non-random
word (I tried within "anticonstitutionnellement", the longest word in
French). It can effectively be used in a discretionary way, including
at non canonical positions where the default hyphenator proposes some
other position or does not hyphenate at all.

However your point is correct : SHY is not an *orthographic*
character. It is strictly a formatting character intended as an hint
for the typesetting of documents.

Regarfing the previous comment about the Danish "aa", given that
Danish normal orthography uses å now for all cases where a legacy "aa"
digram would have been used, there's no need to insert any format
control for other accidental occurences of "aa" as separate letters:
the default for Danish is certainly to disable the recognition of the
legacy digram "aa" if "å" is usable directly in the same context.

The legacy use in Danish would have been old ASCII-encoded texts. But
anyway in this context you would not even have any format control and
no choice than leaving the ambiguity about the digram.

Note that I also don't think that it's necessary to specially encode
any joiner or disjoiner control in the middle of candidate
digrams/trigrams. If it is used, it must be discretionary within
specific documents (already typesetted in their righ-text format), and
such control should be clearly ignorable in case the text was exported
and reimported into another document.
Received on Sun Jul 03 2011 - 08:36:17 CDT

This archive was generated by hypermail 2.2.0 : Sun Jul 03 2011 - 08:36:18 CDT