L2/07-070

Source: Mark Davis
Subject: Comments on Customary Use Property

=============
Kenneth Whistler
to mark.davis, unicore

>> On Tue Jan 16 2007 Mark Davis wrote:

Well, I missed delivery of that one in the midst of the
email list woes, but have gone back to the email archives now,
and located L2/07-021, and have a number of comments.

1. Customary_Use is a very bad name for this property,
as it will inevitably raise flak from not until eternity
regarding inclusions and exclusions for it, based
simply on the name and regardless of how it is documented.
See, for example, Andrew West immediately pointing out
that Mongolian variation selectors are needed for
"customary use" of Mongolian, to make certain distinctions.

I propose instead that the property name be named in
accordance with its actual target use. One possibility
is simply:

   Internet_ID_Use

That would stop all the argumentation about what is or
is not customary for a particular orthography. And it
would also be absolutely clear to the folks on the
other side of the divide as to which property we were
maintaining in the UCD for specifications like the
IDAN-bis protocol.

2. The exclusion list of historic scripts didn't quite
catch up with the discussion that went on on the
idna-update list about this. You should add Ogham
(sc=Ogam) to the exclusion list, and you should *remove*
(for now at least) Runic (sc=Runr) from that list.
The status of Runic as an historic script is clear,
but there is some sentiment in Northern Europe, at least,
for allowing Runic in internet identifiers. So better,
in my opinion, to allow it in the initial repertoire
defined by the property, so it is visible for discussion.

--Ken








ReplyReply to allForwardInvite Kenneth to Gmail








Mark Davis
to Kenneth, unicore Jan 19

Those are good remarks. For the name, I'd rather have something like
Restricted_ID_Use, since while the immediate target might be IDN
(depending on developments in that area), that reflects that the
actual application might be somewhat broader.

Mark
- Show quoted text -

On 1/19/07, Kenneth Whistler <kenw@sybase.com> wrote:

>>
>
>>> > On Tue Jan 16 2007 Mark Davis wrote:
>
>>
>> Well, I missed delivery of that one in the midst of the
>> email list woes, but have gone back to the email archives now,
>> and located L2/07-021, and have a number of comments.
>>
>> 1. Customary_Use is a very bad name for this property,
>> as it will inevitably raise flak from not until eternity
>> regarding inclusions and exclusions for it, based
>> simply on the name and regardless of how it is documented.
>> See, for example, Andrew West immediately pointing out
>> that Mongolian variation selectors are needed for
>> "customary use" of Mongolian, to make certain distinctions.
>>
>> I propose instead that the property name be named in
>> accordance with its actual target use. One possibility
>> is simply:
>>
>>    Internet_ID_Use
>>
>> That would stop all the argumentation about what is or
>> is not customary for a particular orthography. And it
>>  would also be absolutely clear to the folks on the
>> other side of the divide as to which property we were
>> maintaining in the UCD for specifications like the
>> IDAN-bis protocol.
>>
>> 2. The exclusion list of historic scripts didn't quite
>> catch up with the discussion that went on on the
>> idna-update list about this. You should add Ogham
>> (sc=Ogam) to the exclusion list, and you should *remove*
>> (for now at least) Runic (sc=Runr) from that list.
>>  The status of Runic as an historic script is clear,
>> but there is some sentiment in Northern Europe, at least,
>> for allowing Runic in internet identifiers. So better,
>> in my opinion, to allow it in the initial repertoire
>> defined by the property, so it is visible for discussion.
>>
>> --Ken
>>
>>
>>




-- Mark Kenneth Whistler to andrewcwest, unicore Jan 19 - Show quoted text -

>> On 19/01/07, Andrew West <andrewcwest@gmail.com> wrote:
>
>>> >
>>> > I must be missing something obvious, but I just don't see why this is
>>> > so. The stated algorithm is "Generated from
>>> > Other_Default_Ignorable_Code_Point + Cf + Cc + Cs + Noncharacters
>>> > #  - White_Space - FFF9..FFFB (Annotation Characters)", and they do
>>> > not have the Other_Default_Ignorable_Code_Point property in
>>> > PropList.txt
>
>>
>> Well in 4.0.0 <

http://www.unicode.org/Public/4.0-Update/PropList-4.0.0.txt>

>> 180B..180D and VS1..VS256 are in the
>> Other_Default_Ignorable_Code_Point list, but in 4.0.1
>> <http://www.unicode.org/Public/4.0-Update1/PropList-4.0.1.txt> and
>> later they are not.


The derivation of Default_Ignorable_Code_Point has been the victim
of creeping perfectionism, as specific lists of code points have
been replaced with particular properties, designated just for such
derivations, because it allegedly would be "clearer" that way.

In the process, the derivations have proceeded correctly, but the
documentation hasn't always caught up with the derivations.

Note that in any event, the Mongolian free variation selectors
have *always* (correctly) been given the Default_Ignorable_Code_Point
property.

--Ken

Details below:

***********************************************************************

Unicode 3.2.0:

# Derived Property: Default_Ignorable_Code_Point
#  Generated from <2060..206F, FFF0..FFFB, E0000..E0FFF>
#    + Other_Default_Ignorable_Code_Point + (Cf + Cc + Cs - White_Space)

180B..180D    ; Other_Default_Ignorable_Code_Point # Mn   [3] MONGOLIAN 
FREE
VARIATION SELECTOR ONE..MONGOLIAN FREE VARIATION SELECTOR THREE

180B..180D    ; Default_Ignorable_Code_Point # Mn   [3] MONGOLIAN FREE 
VARIATION
SELECTOR ONE..MONGOLIAN FREE VARIATION SELECTOR THREE

This is the version that introduced Default_Ignorable_Code_Point as
a property, and used a derivation based on code point ranges.

***********************************************************************

Unicode 4.0.0:

# Derived Property: Default_Ignorable_Code_Point
#  Generated from Other_Default_Ignorable_Code_Point + Cf + Cc + Cs -
White_Space

180B..180D    ; Other_Default_Ignorable_Code_Point # Mn   [3] MONGOLIAN 
FREE
VARIATION SELECTOR ONE..MONGOLIAN FREE VARIATION SELECTOR THREE

180B..180D    ; Default_Ignorable_Code_Point # Mn   [3] MONGOLIAN FREE 
VARIATION
SELECTOR ONE..MONGOLIAN FREE VARIATION SELECTOR THREE

In Unicode 4.0.0, the code point ranges were removed from the
derivation statement, and were moved instead into the definition
of Other_Default_Ignorable_Code_Point.

***********************************************************************

Unicode 4.0.1:

# Derived Property: Default_Ignorable_Code_Point
#  Generated from Other_Default_Ignorable_Code_Point + Cf + Cc + Cs +
Noncharacters - White_Space - Annotation_characters

180B..180D    ; Variation_Selector # Mn   [3] MONGOLIAN FREE VARIATION 
SELECTOR
ONE..MONGOLIAN FREE VARIATION SELECTOR THREE

180B..180D    ; Default_Ignorable_Code_Point # Mn   [3] MONGOLIAN FREE 
VARIATION
SELECTOR ONE..MONGOLIAN FREE VARIATION SELECTOR THREE

In Unicode 4.0.1, a new property, Variation_Selector, was added
to the standard, and was incorporated into the derivation
of Default_Ignorable_Code_Point. At that time, the variation
selectors were removed from Other_Default_Ignorable_Code_Point,
in part because of a push to ensure that the Other_XYZ properties
were minimized to only include those characters not otherwise
accounted for in the derivations by other properties or lists.

The *problem* is that the comment in the DerivedCoreProperties.txt
file documenting the derivation didn't catch up to the actual
derivation.

***********************************************************************

Unicode 5.0.0:

# Derived Property: Default_Ignorable_Code_Point
#  Generated from Other_Default_Ignorable_Code_Point + Cf + Cc + Cs +
Noncharacters
#  - White_Space - FFF9..FFFB (Annotation Characters)

180B..180D    ; Variation_Selector # Mn   [3] MONGOLIAN FREE VARIATION 
SELECTOR
ONE..MONGOLIAN FREE VARIATION SELECTOR THREE

180B..180D    ; Default_Ignorable_Code_Point # Mn   [3] MONGOLIAN FREE 
VARIATION
SELECTOR ONE..MONGOLIAN FREE VARIATION SELECTOR THREE

In Unicode 5.0 the comment in DerivedCoreProperties.txt was modified
slightly, but the omission of Variation_Selector from the
specification of the derivation, inherited from the 4.0.1 version
of the file, was overlooked. So the documentation of the
derivation is still incomplete there.

******************************
*****************************************

.