From: Jukka K. Korpela (email@example.com)
Date: Mon Jan 09 2006 - 13:30:02 CST
On Mon, 9 Jan 2006, Kent Karlsson wrote:
>> Theoretically, U+0132 is a compatibility character with U+0049 U+004A
>> as the compatibility decomposition.
> It has the *standardised* (non-theoretical) decomposition: <compat> 0049
The word "Theoretically" meant that I first considered how things are in
principle, by the Unicode standard.
>> Being a compatibility decomposable
>> character, it is not recommended except in the representation
> No, it does not say that.
"Compatibility decomposable characters are a subset of compatibility
characters included in the Unicode Standard to represent distinctions in
other base standards. They support transmission and processing of legacy
data. Their use is discouraged other than for legacy data or other special
Definition D21 in section 3,
> There are exceptions to that interpretation
> of compatibility characters (and compatibility decomposable characters),
> the IJ LIGATURE and the LONG S are among them. I think it is perfectly
> fine to recommend their use in situations like this
I think so too; we seem to agree on the practical point. But I discussed
what the standard says (in a somewhat odd place, but the same general idea
can be seen elsewhere in the standard, too).
>> Note that although the U+0132 indicates a ligature character, its
>> decomposition does not include U+200D (word joiner) or any other
> 200D is ZERO WIDTH JOINER, 2060 is WORD JOINER. Neither is used in any
> decomposition mapping except for themselves.
Right. (I was thinking whether I should mention the difference, and of
course the wrong name crept into my text. :-( )
> ZWJ could be used to "recommend" the use of a typographic ligature, but
> should not (IMO) be used to form *orthographic* ligatures
Such a distinction does not exist in the Unicode standard, and as you
mention, the IJ ligature would be a borderline case anyway.
Especially considering the classification of the ij ligature as a letter
in Dutch, we might say that it should really have been defined as a
primary (non-compatibility) character, much the same way as the oe
ligature and the ae ligature (which is now even called "letter ae",
not a ligature, though it's still effective used as a ligature, too).
But it's too late to change that now. (Maybe some official statement,
constituting an explicit exception to the principle of avoiding
compatibility decomposable characters, would be in order.)
-- Jukka "Yucca" Korpela, http://www.cs.tut.fi/~jkorpela/
This archive was generated by hypermail 2.1.5 : Mon Jan 09 2006 - 13:31:58 CST