From: Asmus Freytag (firstname.lastname@example.org)
Date: Wed Mar 12 2008 - 09:52:26 CST
the name for 034F is indeed somewhat unfortunate, but it's not a
"mistake" in the usual sense.
Normally, adding a formal alias would therefore not be a proper remedy,
but in this case, your proposal has some hidden virtues. If one were to
add your proposed alias, then the character would be named *both*
separator and joiner, indicating perhaps that its true name should have
been "COMBINING CHARACTER THAT DOES SOMETHING SPECIAL"
Another nice alias might have been "COMBINING CHARACTER WITH UNUSUAL
Jokes, aside, there *is* a strong element of a "separator" functionality
inherent in the CGJ. For example, the fact that it has canonical
combining class 0 makes it a separator between other combining marks in
the same combining sequence (each side of the CGJ gets ordered
separately in canonical reordering).
That effect of the CGJ is the most definite, most normative effect it
has, because it derives from the formal properties of the character. The
effects that it *might* have in sorting, on the other hand, are all a
matter of convention: you need to tailor your sort tables so that they
recognize the CGJ by giving it a special sort weight, or by giving
sequences without the CGJ a special sorting behavior. Cases like the
"ch" for Slovak example that you cite are relatively "automatic" because
usually, the mere presence of a character not specifically accounted for
in the sorting tables would interrupt the treatment of "ch" as a
contraction. If that character is invisible, you get the correct effect.
(Danes have long SHY to separate "aa", but that's because a syllable
boundary is usually present there anyway).
As a mental shorthand, to remind myself of the properties of the CGJ, I
think of it as "INVISIBLE ENCLOSING MARK". (Although it's gc=Mn, for
whatever reason, so it should have been "invisible nonspacing mark with
Whether any of these suggestions would make a good alias (or even formal
alias) I'll let others decide.
On 3/11/2008 9:59 PM, Karl Pentzlin wrote:
> Following the description in p.542 of TUS 5.0, the CGJ
> (i.e. U+034F COMBINING GRAPHEME JOINER) separates graphemes,
> e.g. in Slovak, it prevents a "ch" to be interpreted as a grapheme.
> Thus, the CGJ splits or separates, but does not "join" in any case.
> In the code table, the character has a informative note
> "The name of this character is misleading, it does not actually join
> graphemes", without giving more information.
> Is it appropriate to propose a formal alias like
> "COMBINING GRAPHEME SEPARATOR"?
> - Karl Pentzlin
This archive was generated by hypermail 2.1.5 : Wed Mar 12 2008 - 09:54:52 CST