RE: Classification of U+30FC KATAKANA-HIRAGANA PROLONGED SOUND MA RK

From: Mount, Rob (Robert F) (rfmount@ingr.com)
Date: Thu Jun 05 2003 - 14:57:12 EDT

Next message: Mark Davis: "Re: Classification of U+30FC KATAKANA-HIRAGANA PROLONGED SOUND MARK"

Previous message: Mark Davis: "Re: conformance for unicode 2.x?"
Maybe in reply to: Marco Cimarosti: "RE: Classification of U+30FC KATAKANA-HIRAGANA PROLONGED SOUND MA RK"
Next in thread: Mark Davis: "Re: Classification of U+30FC KATAKANA-HIRAGANA PROLONGED SOUND MARK"
Reply: Mark Davis: "Re: Classification of U+30FC KATAKANA-HIRAGANA PROLONGED SOUND MARK"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Thanks to all who responded. The insight you provided is invaluable. And I

appreciate your patience with a UNICODE beginner.

Mark's reference to UCD.html, and by inference to DerivedCoreProperties.txt,
seems difinitive. However, these are part of the 4.0 spec. The suspect
implementation of isalpha is based, according to the vendor, on 3.0.1.

The vendor relys, instead, on
http://www.unicode.org/Public/3.0-Update1/PropList-3.0.1.txt
which classifies 30FC as Diacritic, Extender, Bidi: Left-to-Right, and
Identifier Part, but not
as Alphabetic. Is this an error in the specification? I could find no
reference to the Alphabetic
property in the 3.0.1 documentation except in, and with reference to,
PropList-3.0.1.txt.
However, it would seem, from the 4.0 documentation, that all characters
having a General
Category beginning with "L" should be considered as letters, and hence,
implicitly, as Alphabetic.

Is this, indeed, the intent of the General Category classifications
beginning with "L"?

Is the ommision of 30FC from the Alphabetic category of PropList.txt an
error?

Rob

-----Original Message-----
From: Mark Davis [mailto:mark.davis@jtcsv.com]
Sent: Thursday, June 05, 2003 9:28 AM
To: Mount, Rob (Robert F); unicode@unicode.org
Subject: Re: Classification of U+30FC KATAKANA-HIRAGANA PROLONGED SOUND
MARK

The UCD has a property explicitly called "Alphabetic" in the UCD. So
that should be used when determining whether a character is, well,
alphabetic. See http://www.unicode.org/Public/UNIDATA/UCD.html

However, in the past many people have misused functions like isAlpha()
for doing more complicated processing like determining text boundaries
(line and word breaks, for example). The function isAlpha() does not
discriminate finely enough to be very accurate for processing like
that. For more information, see
http://www.unicode.org/reports/tr14/
http://www.unicode.org/reports/tr29/

Also see the proposed update to Unicode Regular Expressions, for
discussion of the use of Unicode properties in connection with alpha,
punct, etc. (in the context of regular expressions, at least).
http://www.unicode.org/reports/tr18/tr18-7.html#Compatibility_Properties

Mark
__________________________________
http://www.macchiato.com
? "Eppur si muove" ?

----- Original Message -----
From: "Mount, Rob (Robert F)" <rfmount@ingr.com>
To: <unicode@unicode.org>
Sent: Wednesday, June 04, 2003 16:11
Subject: Classification of U+30FC KATAKANA-HIRAGANA PROLONGED SOUND
MARK

> All,
> I am investigating differing behavior in various environments of the
> wide-character version of the C function isAlpha with respect to
> character U+30FC KATAKANA-HIRAGANA PROLONGED SOUND MARK. Some
> implementations indicate that it is alphabetic, some don't. I
> suspect that other characters might be subject to the same
confusion.
>
> The UNICODE documents seem abiguous on this point: the General
> Catetory is "Lm" which, although informative instead of normative,
> would seem to imply that it is alphabetic; likewise
> DerivedCoreProperties-4.0.0.txt indicates that it is alphabetic; but
> PropList-4.0.0.txt contains two records - one indicating that it is
> a diacritic, one that indicates it is an extender.
>
> On to my questions:
>
> Q1: Can a character be both alphabetic and diacritic?
>
> Q2: Is there a difinitive answer as to whether this is an alphabetic
> character?
>
> Thanks in advance for answers to these questions and/or any
> additional isight you can provide.
>
> Regards,
> Rob Mount
>
>
>
>
>
>
>
>
>

Next message: Mark Davis: "Re: Classification of U+30FC KATAKANA-HIRAGANA PROLONGED SOUND MARK"
Previous message: Mark Davis: "Re: conformance for unicode 2.x?"
Maybe in reply to: Marco Cimarosti: "RE: Classification of U+30FC KATAKANA-HIRAGANA PROLONGED SOUND MA RK"
Next in thread: Mark Davis: "Re: Classification of U+30FC KATAKANA-HIRAGANA PROLONGED SOUND MARK"
Reply: Mark Davis: "Re: Classification of U+30FC KATAKANA-HIRAGANA PROLONGED SOUND MARK"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Thu Jun 05 2003 - 15:50:49 EDT