SHY, CGJ, etc. (was: Re: unicode Digest V12 #108)

From: <>
Date: Sun, 3 Jul 2011 04:07:09 +0000

I'm a bit concerned about the implication that correctly encoded Breton, Welsh, etc. Unicode text needs to be sprinkled liberally with SHY or CGJ or other invisible formatting characters, to resolve any possible ambiguity in these languages' orthographies. This is like saying English text needs to have a SHY at every potential hyphenation point, so text processors don't have to use a dictionary to hyphenate.

I can easily see this thread being misinterpreted or taken out of context by newcomers, or reposted or blogged by someone eager to make a point about unneeded complexity in Unicode. Really, for 99.9% of applications, shouldn't we just write the letters?

Sent via BlackBerry by AT&T

-----Original Message-----
From: Asmus Freytag <>
Date: Sat, 02 Jul 2011 10:02:03
To: <>
Cc: Andrew Miller<>; <>
Subject: Re: unicode Digest V12 #108

On 7/2/2011 8:59 AM, Philippe Verdy wrote:
> 2011/7/2 Andrew Miller<>:
>> The "ng" in Llangollen is not the digram "ng" but two separate letters
>> (unlike the "ll" in the name which is the digram).
> Why not simply using a soft hyphen between "n" and "g" in this case ?
> Soft hyphens are normally recognized as such by smart correctors and
> as well by search engines or collators. It seems enough for me to
> indicate that this is not the Welsh digram "ng" ; CGJ anyway is
> certainly not the correct disjoiner in your case.
This solution works well if the word can split between the n and the g.

In fact, if such split is possible, I would call it the preferred
solution to indicating an "accidental" digraph.

An example:

The Danish digraph "aa", normally spelled "å" in modern orthography, but
retained in names etc. can occur "accidentally" in compound nouns, such
as "dataanalyse". Adding a SHY is the preferred method to indicate that
the "aa" is accidental.

Other characters may have the same effect of breaking the digraph, their
use might require an *additional* SHY to be inserted, if and when a
linebreak opportunity needs to be manually marked (say for an unusual
compound not recognized by the automatic hyphenator). It would be bad to
have to have *two* invisible characters at that location.

Received on Sat Jul 02 2011 - 23:10:19 CDT

This archive was generated by hypermail 2.2.0 : Sat Jul 02 2011 - 23:10:20 CDT