From: Mark Davis (email@example.com)
Date: Thu Nov 17 2005 - 17:51:04 CST
It is not that clear-cut. Identifiers by their nature cannot include all
words and phrases valid in all languages. For IDN, for example, one
can't express the perfectly reasonable English word "can't", or a word
I did introduce a proposal in March for considering the status of some
word characters, which turned into a discussion into the UTC of whether
to add certain items to the identifier definition.
(I'll copy that section here for those without access:
0027 ; # Po APOSTROPHE
002D ; # Pd HYPHEN-MINUS
002E ; # Po FULL STOP
003A ; # Po COLON
00B7 ; # Po MIDDLE DOT
058A ; # Pd ARMENIAN HYPHEN
05F3 ; # Po HEBREW PUNCTUATION GERESH
05F4 ; # Po HEBREW PUNCTUATION GERSHAYIM
200C ; # Cf ZERO WIDTH NON-JOINER // for Indic?
200D ; # Cf ZERO WIDTH JOINER // for Indic?
2010 ; # HYPHEN
2019 ; # Pf RIGHT SINGLE QUOTATION MARK
2027 ; # Po HYPHENATION POINT
30A0 ; # Pd KATAKANA-HIRAGANA DOUBLE HYPHEN
The UTC decided that against adding them to the identifier definition.
If we were to change that for the Hebrew punctuation, we would have to
see a documented case for it.
Michael Everson wrote:
> At 17:42 +0100 2005-11-17, Cary Karp wrote:
>>> "These punctuation marks may not be available in all fonts (and legacy
>>> encodings), so an implementation should be prepared to degrade
>>> U0027 APOSTROPHE for GERESH and U0022 QUOTATION MARK for GERSHAYIM are
>>> acceptable fallbacks."
>> The problem is that these fallbacks are not available in IDN under
>> any circumstances.
> If that is the case then surely the real characters must be allowed.
This archive was generated by hypermail 2.1.5 : Thu Nov 17 2005 - 17:52:25 CST