    On Wed, 4 Jun 2003 18:11:48 -0500 , "Mount, Rob (Robert F)" wrote:

    > I am investigating differing behavior in various environments of the
    > wide-character version of the C function isAlpha with respect to
    > The UNICODE documents seem abiguous on this point: the General
    > Catetory is "Lm" which, although informative instead of normative,
    > would seem to imply that it is alphabetic; likewise
    > DerivedCoreProperties-4.0.0.txt indicates that it is alphabetic; but
    > PropList-4.0.0.txt contains two records - one indicating that it is
    > a diacritic, one that indicates it is an extender.

    U+30FC (KATAKANA-HIRAGANA PROLONGED SOUND MARK) is, I would say, identical in
    function to U+02D0 (MODIFIER LETTER TRIANGULAR COLON) that is used to indicate a
    long vowel in IPA. Both U+30FC and U+02D0 are signs that are appended to a
    character representing a vowel to indicate that it is a long vowel sound.

    Both U+30FC and U+02D0 have a General Category of "Lm" (Modifier_Letter), and in
    PropList.txt are included under the Extender property. However only U+30FC is
    also included under the Diacritic property. Likewise, U+1843 (MONGOLIAN LETTER
    TODO LONG VOWEL SIGN), which has a similar function to U+30FC, is classified as
    an Extender but not as a Diacritic.

    The definition of "Extender" in UCD.html is :

    "Characters whose principal function is to extend the value or shape of a
    preceding alphabetic character. Typical of these are length and iteration marks."

    U+30FC, U+02D0 and U+30FC are indeed all "length marks", and are rightly
    classified as Extenders.

    But why then is U+30FC alone also classified as a Diacritic (according to
    UCD.html "Characters that linguistically modify the meaning of another character
    to which they apply") ? As far as I am aware U+30FC does not "linguistically
    modify the meaning of another character" other than lengthen a preceding vowel.


