Re: Regarding canonical combing class value for U+0F76 and similar characters (Unicode 6.2.0)

From: Philippe Verdy <verdy_p_at_wanadoo.fr>
Date: Sat, 18 May 2013 02:02:07 +0200

Yes it is expected. And in fact very common in Unicode since long (there
are in fact many "Mn" marks with combining class 0, this is not just for
one script).

A combining class 0 DOES NOT mean that the character will not be a
non-spacing mark (or that it will be spacing), but just that it blocks
reorderings under standard normalizations and for recognizing canonical
equivalences.

(see for example CGJ which also has combining class 0 and which is used
mostly to insert such blocking behavior, without having any real semantic
meaning by itself ; once the normalization step has been done, it can be
discarded from the input stream in renderers or collators, except for
special purposes like rendering CGJ with its own visible glyph in some
"visible controls" edit mode)

Technically CGJ is not "Mn" (not a combining mark) but a "formatting
control", but it still participates to the grouping of "default grapheme
clusters", as if it was a combining mark -- and for most parts, it is an
artefact of the encoding in the UCS, and considered foreign to the script
by native writers, but it is also needed for compatibility reasons.
However, in many scripts, there exists true combining marks (Mn) that have
combining class 0 (i.e. whose relative ordering in the encoded stream is
semantically significant when they are used in conjunction with other
reorderable combining marks).

General categories are very fuzzy in fact ; this is an old property with
lots of caveats if you want to use this property alone, and there are newer
orthogonal properties for more precise behaviors for specific algorithms.

General categories however have been kept for compatibility reason with
early implementations (when the UCS was much less populated and much less
languages were supported, and many algorithms were still not standardized
because the best practices were still not known, or not well understood, or
not yet decided and documented to support these languages).

You'll certainly see in this list other responses giving another point of
view.

2013/5/18 Matt Ma <matt.ma.umail_at_gmail.com>

> Hi,
>
> U+0F76 is a non-spacing combing mark (Mn) but its combing class value is
> defined as 0. Is this expected? The specialty of the character is that it
> is a composition of two combining marks, U+0FB2 and U+0F80.
>
> Same question goes for U+0F73, U+0F75, U+0F77, U+0F78, and U+0F79.
>
> Thanks and regards,
> Matt
>
Received on Fri May 17 2013 - 19:04:52 CDT

This archive was generated by hypermail 2.2.0 : Fri May 17 2013 - 19:04:53 CDT