L2/04-250

Source: Sandra O'Donnell
Date: 2004-06-22 00:15:31 -0700
Subject: Punctuation in display strings

Rick, I know the deadline has passed for UTC #99. If you can make this an L2
doc and add it to the register, that's great. Otherwise, I'll bring copies
to Toronto.

Update to attached message -- The change Deborah recommended was made in
CLDR 1.1. This can be revised in CLDR 1.2, if necessary. The issue for the
UTC, I believe, is whether we want to push for more correct usage of certain
punctuation characters (e.g., U+2019 rather than U+0027) in all Unicode
documents and data files. If so, I believe we should be clear about what
change we are proposing. I do not believe it is a good idea to "fix"
punctuation characters in a few Unicode-related files while maintaining the
old ASCII-based usage in other files.

		Regards,
		-- Sandra
-----------------------
Sandra Martin O'Donnell
Hewlett Packard Company
sandra.odonnell@hp.com
odonnell@zk3.dec.com 

-----Original Message-----
From: Sandra O'Donnell
Sent: Thursday, June 10, 2004 2:05 PM
To: 'Deborah Goldsmith'; cldr
Subject: RE: Punctuation in display strings

Clearly, I'm behind in email, but I don't remember seeing any response to
this. I have serious concerns about making this proposed change.

Going from the text of TUS, it appears that U+2019 is the more correct
character to use. However, it seems like tilting at windmills to use U+2019
within translations like these. That's because of the vast, vast, vast, vast
(have I made my point about how big this is? :-) ) existing usage of U+0027
as the apostrophe. I've already used it multiple times in this message, and
I'd guess that virtually every English email message also "misuses" U+0027
as the apostrophe rather than U+2019.

According to TUS 4.0 (pg. 159), "U+0027 APOSTROPHE is the most commonly used
character for apostrophe. However, it has ambiguous semantics and direction.
When text is set, U+2019 RIGHT SINGLE QUOTATION MARK is preferred as
apostrophe. Word processors commonly offer a facility for automatically
converting the U+0027 APOSTROPHE to a contextually selected curly quotation
glyph."

Besides the unfortunate fact that U+2019's name seems designed to obfuscate
its intention to be the preferred apostrophe, there's the question of what
most software systems will expect when looking for a match. My educated
guess is that users would type in the name using U+0027 rather than with the
"preferred" U+2019. And they won't get a match. 

I suspect this change would create many more headaches than it would solve.
And that unless we're prepared to push for full replacement of U+0027 with
U+2019 for all uses of the apostrophe in all Unicode-based text, it is
incorrect to try to make the change within a few CLDR data files.

Other opinions?

		-- Sandra

-----Original Message-----
Sent: Tuesday, May 18, 2004 9:14 PM
From: Deborah Goldsmith
To: cldr
Subject: Punctuation in display strings

I've raised the issue before and thought we had agreement, but I notice 
it wasn't fixed in the TOT sources in CVS, so I'd like to raise it 
again.

The issue is the use of 7-bit ASCII punctuation in user-visible display 
strings. For example, from the en.txt locale (hex escapes converted to 
Unicode):

         CI { "Côte d'Ivoire" }

This should really be:

         CI { "Côte d’Ivoire" }

One could go even farther and argue that hyphens in examples like the 
following:

         TL { "Timor-Leste" }

should be U+2013 EN DASH instead of U+002D HYPHEN-MINUS. We don't have 
to go that far. :-) However, U+0027 APOSTROPHE does not fit in with a 
modern user interface. All of Apple's products uniformly use proper 
typography in user-visible strings (not including e-mails typed by lazy 
typists...).

It would be fairly easy for Chris and I to replace U+0027 APOSTROPHE 
with whatever the correct character is (U+2019 RIGHT SINGLE QUOTATION 
MARK for punctuation, U+02BC MODIFIER LETTER APOSTROPHE or a neighbor 
for glottal stop, etc.). The number of occurrences is quite small.

Is there any objection to making this kind of change?

Deborah