L2/01-312 From: Sandra O'donnell USG [odonnell@zk3.dec.com] Sent: Thursday, August 09, 2001 11:02 AM Rough comments on L2/01-282 (European generic locales) Comments on L2/01-282 (European generic locales - Part 2: Narrative cultural specifications, POSIX locales, and repertoiremap) Because I'm assuming most people haven't had a chance to read this yet, here's a summary of this CEN document. It includes: * A repertoiremap of European character that probably matches MES-2 (confirmation pending) and that uses the Danish mnemonics for characters (e.g., for E-caron; for Cyrillic A-with-diaresis; for Greek small eta with dasia and varia) and includes ISO/IEC 10646/Unicode identifiers (Uxxxx) as comments). * A generic _EU locale that includes character classification data (upper, lower, punct, etc.), a collation order, numeric formatting, monetary formatting using the euro as the currency symbol, a generic date/time section that uses numbers for all month and day names rather than language-specific strings, and generic yes/no responses ("+" for affirmative, "-" for negative). This file uses the Danish mnemonics only; no Uxxxx identifiers. * A set of 14 country-specific narrative cultural specifications that describe in words the contents of the accompanying POSIX locales. * A set of 14 country-specific POSIX locales. All these locales use the generic _EU definitions for classification, collation, monetary, and numeric information with no modifications. The only locale-specific information is in LC_TIME, which lists language-specific names for month and weekday names, (but defaults to the generic locale for formatting rules), and yes/no responses. The following comments refer to the repertoire map first, and then the _EU locale. Please let me know if you have questions or comments. I'll be in the office today (Thu, Aug 9), and then On the Road to Redmond. -- Sandra ----------------------- Sandra Martin O'Donnell Compaq Computer Corporation sandra.odonnell@compaq.com odonnell@zk3.dec.com **************IN THE REPERTOIRE MAP: * What is the rationale for membership in the repertoire map? It includes Latin, Greek, and Cyrillic characters, as well as control characters, some punctuation, dingbats, and others. But the repertoire seems unusual. For example, from the Superscripts and Subscripts block of ISO/IEC 10646, this rep. map includes only U207F [SUPERSCRIPT LATIN SMALL LETTER N] and U2082 [SUBSCRIPT TWO]). Why include those two, but not include, say, the subscripts 0, 1, 3, 4, etc. which also are in this block? Why include the ligatures fi and fl from the Alphabetic Presentation Forms (UFB01 and UFB02), but not include ligatures ff or ffi (UFB00 and UFB03) in the same block? Why include six of the 60 characters from the Letterlike Symbols block, but not others? Those included are: U2105 CARE OF U2113 SCRIPT SMALL L U2116 NUMERO SIGN U2122 TRADE MARK SIGN U2126 OHM SIGN U212E ESTIMATED SYMBOL Others from this block include U2103 (DEGREE CELSIUS), U2107 (EULER CONSTANT), U2112 (SCRIPT CAPITAL L [counterpart to U2113, which is included]), and many others. Given the ones that *are* included, the list of those that are not seems odd. Why include all of the Greek characters, but only a subset of Latin characters? * The repertoire map should not use the Danish mnemonics. It should use only the Uxxxx identifiers. This would be consistent with ISO/IEC 14651 and with ISO/IEC 10646. * Near the end of the repertoire map, some characters are repeated, but with different mnemonics. They are: Character 1st mne. 2nd mne. NUMBER SIGN DOLLAR SIGN COMMERCIAL AT <@> (also includes as 3rd mne. CENT SIGN POUND SIGN CURRENCY SIGN YEN SIGN BROKEN BAR SECTION SIGN NOT SIGN <7!> PILCROW SIGN <9I> Why are these repeated? This is confusing. * At the very end of the repertoire map, there is a group of box drawing characters. Earlier in the map, a larger group of such characters is defined. At the end, it includ