L2/07-070 Source: Mark Davis Subject: Comments on Customary Use Property ============= Kenneth Whistler to mark.davis, unicore >> On Tue Jan 16 2007 Mark Davis wrote: Well, I missed delivery of that one in the midst of the email list woes, but have gone back to the email archives now, and located L2/07-021, and have a number of comments. 1. Customary_Use is a very bad name for this property, as it will inevitably raise flak from not until eternity regarding inclusions and exclusions for it, based simply on the name and regardless of how it is documented. See, for example, Andrew West immediately pointing out that Mongolian variation selectors are needed for "customary use" of Mongolian, to make certain distinctions. I propose instead that the property name be named in accordance with its actual target use. One possibility is simply: Internet_ID_Use That would stop all the argumentation about what is or is not customary for a particular orthography. And it would also be absolutely clear to the folks on the other side of the divide as to which property we were maintaining in the UCD for specifications like the IDAN-bis protocol. 2. The exclusion list of historic scripts didn't quite catch up with the discussion that went on on the idna-update list about this. You should add Ogham (sc=Ogam) to the exclusion list, and you should *remove* (for now at least) Runic (sc=Runr) from that list. The status of Runic as an historic script is clear, but there is some sentiment in Northern Europe, at least, for allowing Runic in internet identifiers. So better, in my opinion, to allow it in the initial repertoire defined by the property, so it is visible for discussion. --Ken ReplyReply to allForwardInvite Kenneth to Gmail Mark Davis to Kenneth, unicore Jan 19 Those are good remarks. For the name, I'd rather have something like Restricted_ID_Use, since while the immediate target might be IDN (depending on developments in that area), that reflects that the actual application might be somewhat broader. Mark - Show quoted text - On 1/19/07, Kenneth Whistler wrote: >> > >>> > On Tue Jan 16 2007 Mark Davis wrote: > >> >> Well, I missed delivery of that one in the midst of the >> email list woes, but have gone back to the email archives now, >> and located L2/07-021, and have a number of comments. >> >> 1. Customary_Use is a very bad name for this property, >> as it will inevitably raise flak from not until eternity >> regarding inclusions and exclusions for it, based >> simply on the name and regardless of how it is documented. >> See, for example, Andrew West immediately pointing out >> that Mongolian variation selectors are needed for >> "customary use" of Mongolian, to make certain distinctions. >> >> I propose instead that the property name be named in >> accordance with its actual target use. One possibility >> is simply: >> >> Internet_ID_Use >> >> That would stop all the argumentation about what is or >> is not customary for a particular orthography. And it >> would also be absolutely clear to the folks on the >> other side of the divide as to which property we were >> maintaining in the UCD for specifications like the >> IDAN-bis protocol. >> >> 2. The exclusion list of historic scripts didn't quite >> catch up with the discussion that went on on the >> idna-update list about this. You should add Ogham >> (sc=Ogam) to the exclusion list, and you should *remove* >> (for now at least) Runic (sc=Runr) from that list. >> The status of Runic as an historic script is clear, >> but there is some sentiment in Northern Europe, at least, >> for allowing Runic in internet identifiers. So better, >> in my opinion, to allow it in the initial repertoire >> defined by the property, so it is visible for discussion. >> >> --Ken >> >> >> -- Mark Kenneth Whistler to andrewcwest, unicore Jan 19 - Show quoted text - >> On 19/01/07, Andrew West wrote: > >>> > >>> > I must be missing something obvious, but I just don't see why this is >>> > so. The stated algorithm is "Generated from >>> > Other_Default_Ignorable_Code_Point + Cf + Cc + Cs + Noncharacters >>> > # - White_Space - FFF9..FFFB (Annotation Characters)", and they do >>> > not have the Other_Default_Ignorable_Code_Point property in >>> > PropList.txt > >> >> Well in 4.0.0 < http://www.unicode.org/Public/4.0-Update/PropList-4.0.0.txt> >> 180B..180D and VS1..VS256 are in the >> Other_Default_Ignorable_Code_Point list, but in 4.0.1 >> and >> later they are not. The derivation of Default_Ignorable_Code_Point has been the victim of creeping perfectionism, as specific lists of code points have been replaced with particular properties, designated just for such derivations, because it allegedly would be "clearer" that way. In the process, the derivations have proceeded correctly, but the documentation hasn't always caught up with the derivations. Note that in any event, the Mongolian free variation selectors have *always* (correctly) been given the Default_Ignorable_Code_Point property. --Ken Details below: *********************************************************************** Unicode 3.2.0: # Derived Property: Default_Ignorable_Code_Point # Generated from <2060..206F, FFF0..FFFB, E0000..E0FFF> # + Other_Default_Ignorable_Code_Point + (Cf + Cc + Cs - White_Space) 180B..180D ; Other_Default_Ignorable_Code_Point # Mn [3] MONGOLIAN FREE VARIATION SELECTOR ONE..MONGOLIAN FREE VARIATION SELECTOR THREE 180B..180D ; Default_Ignorable_Code_Point # Mn [3] MONGOLIAN FREE VARIATION SELECTOR ONE..MONGOLIAN FREE VARIATION SELECTOR THREE This is the version that introduced Default_Ignorable_Code_Point as a property, and used a derivation based on code point ranges. *********************************************************************** Unicode 4.0.0: # Derived Property: Default_Ignorable_Code_Point # Generated from Other_Default_Ignorable_Code_Point + Cf + Cc + Cs - White_Space 180B..180D ; Other_Default_Ignorable_Code_Point # Mn [3] MONGOLIAN FREE VARIATION SELECTOR ONE..MONGOLIAN FREE VARIATION SELECTOR THREE 180B..180D ; Default_Ignorable_Code_Point # Mn [3] MONGOLIAN FREE VARIATION SELECTOR ONE..MONGOLIAN FREE VARIATION SELECTOR THREE In Unicode 4.0.0, the code point ranges were removed from the derivation statement, and were moved instead into the definition of Other_Default_Ignorable_Code_Point. *********************************************************************** Unicode 4.0.1: # Derived Property: Default_Ignorable_Code_Point # Generated from Other_Default_Ignorable_Code_Point + Cf + Cc + Cs + Noncharacters - White_Space - Annotation_characters 180B..180D ; Variation_Selector # Mn [3] MONGOLIAN FREE VARIATION SELECTOR ONE..MONGOLIAN FREE VARIATION SELECTOR THREE 180B..180D ; Default_Ignorable_Code_Point # Mn [3] MONGOLIAN FREE VARIATION SELECTOR ONE..MONGOLIAN FREE VARIATION SELECTOR THREE In Unicode 4.0.1, a new property, Variation_Selector, was added to the standard, and was incorporated into the derivation of Default_Ignorable_Code_Point. At that time, the variation selectors were removed from Other_Default_Ignorable_Code_Point, in part because of a push to ensure that the Other_XYZ properties were minimized to only include those characters not otherwise accounted for in the derivations by other properties or lists. The *problem* is that the comment in the DerivedCoreProperties.txt file documenting the derivation didn't catch up to the actual derivation. *********************************************************************** Unicode 5.0.0: # Derived Property: Default_Ignorable_Code_Point # Generated from Other_Default_Ignorable_Code_Point + Cf + Cc + Cs + Noncharacters # - White_Space - FFF9..FFFB (Annotation Characters) 180B..180D ; Variation_Selector # Mn [3] MONGOLIAN FREE VARIATION SELECTOR ONE..MONGOLIAN FREE VARIATION SELECTOR THREE 180B..180D ; Default_Ignorable_Code_Point # Mn [3] MONGOLIAN FREE VARIATION SELECTOR ONE..MONGOLIAN FREE VARIATION SELECTOR THREE In Unicode 5.0 the comment in DerivedCoreProperties.txt was modified slightly, but the omission of Variation_Selector from the specification of the derivation, inherited from the 4.0.1 version of the file, was overlooked. So the documentation of the derivation is still incomplete there. ****************************** ***************************************** .