L2/04-250 Source: Sandra O'Donnell Date: 2004-06-22 00:15:31 -0700 Subject: Punctuation in display strings Rick, I know the deadline has passed for UTC #99. If you can make this an L2 doc and add it to the register, that's great. Otherwise, I'll bring copies to Toronto. Update to attached message -- The change Deborah recommended was made in CLDR 1.1. This can be revised in CLDR 1.2, if necessary. The issue for the UTC, I believe, is whether we want to push for more correct usage of certain punctuation characters (e.g., U+2019 rather than U+0027) in all Unicode documents and data files. If so, I believe we should be clear about what change we are proposing. I do not believe it is a good idea to "fix" punctuation characters in a few Unicode-related files while maintaining the old ASCII-based usage in other files. Regards, -- Sandra ----------------------- Sandra Martin O'Donnell Hewlett Packard Company sandra.odonnell@hp.com odonnell@zk3.dec.com -----Original Message----- From: Sandra O'Donnell Sent: Thursday, June 10, 2004 2:05 PM To: 'Deborah Goldsmith'; cldr Subject: RE: Punctuation in display strings Clearly, I'm behind in email, but I don't remember seeing any response to this. I have serious concerns about making this proposed change. Going from the text of TUS, it appears that U+2019 is the more correct character to use. However, it seems like tilting at windmills to use U+2019 within translations like these. That's because of the vast, vast, vast, vast (have I made my point about how big this is? :-) ) existing usage of U+0027 as the apostrophe. I've already used it multiple times in this message, and I'd guess that virtually every English email message also "misuses" U+0027 as the apostrophe rather than U+2019. According to TUS 4.0 (pg. 159), "U+0027 APOSTROPHE is the most commonly used character for apostrophe. However, it has ambiguous semantics and direction. When text is set, U+2019 RIGHT SINGLE QUOTATION MARK is preferred as apostrophe. Word processors commonly offer a facility for automatically converting the U+0027 APOSTROPHE to a contextually selected curly quotation glyph." Besides the unfortunate fact that U+2019's name seems designed to obfuscate its intention to be the preferred apostrophe, there's the question of what most software systems will expect when looking for a match. My educated guess is that users would type in the name using U+0027 rather than with the "preferred" U+2019. And they won't get a match. I suspect this change would create many more headaches than it would solve. And that unless we're prepared to push for full replacement of U+0027 with U+2019 for all uses of the apostrophe in all Unicode-based text, it is incorrect to try to make the change within a few CLDR data files. Other opinions? -- Sandra -----Original Message----- Sent: Tuesday, May 18, 2004 9:14 PM From: Deborah Goldsmith To: cldr Subject: Punctuation in display strings I've raised the issue before and thought we had agreement, but I notice it wasn't fixed in the TOT sources in CVS, so I'd like to raise it again. The issue is the use of 7-bit ASCII punctuation in user-visible display strings. For example, from the en.txt locale (hex escapes converted to Unicode): CI { "Côte d'Ivoire" } This should really be: CI { "Côte d’Ivoire" } One could go even farther and argue that hyphens in examples like the following: TL { "Timor-Leste" } should be U+2013 EN DASH instead of U+002D HYPHEN-MINUS. We don't have to go that far. :-) However, U+0027 APOSTROPHE does not fit in with a modern user interface. All of Apple's products uniformly use proper typography in user-visible strings (not including e-mails typed by lazy typists...). It would be fairly easy for Chris and I to replace U+0027 APOSTROPHE with whatever the correct character is (U+2019 RIGHT SINGLE QUOTATION MARK for punctuation, U+02BC MODIFIER LETTER APOSTROPHE or a neighbor for glottal stop, etc.). The number of occurrences is quite small. Is there any objection to making this kind of change? Deborah