From: QSJN 4 UKR (email@example.com)
Date: Fri Feb 11 2011 - 04:42:37 CST
>"Jukka K. Korpela" <firstname.lastname@example.org>:
>Converting text to uppercase is always a matter of judgment. You should not assume that such a conversion can always be made without changing or distorting the information content. In fact, uppercase-converting “ms” as an SI notation would be, in a sense, worse than uppercase-converting “µs”. The latter would produce “ΜS”, a mix of Greek and Latin letters, therefore suspicious, and definitely incorrect as an SI notation even at the character level – the SI does not use capital mu at all. But uppercase-converting “ms” produces “MS”, which looks innocent and is correct as an SI notation, though it means megasiemens and not millisecond.
Don't you know, the Greeks use the Greek alphabet for SI notation sometimes.
There are several different applications of the letter cases. They
are used stylistically, for example, the using a capital or title
letters in the headers, grammatically, when the capital letter
identifies the beginning of the sentence, the proper name, any name in
German, and semantically, for example, in SI units or chemical
To support all these cases, it would be nice to use special control
characters in the text, which would indicate where the change in the
case is admissible and where is not. Or to use for the SI, chemical
and mathematical notation and - for capitalization of proper names
(???) - those characters who have no case mapping, U+1D400 etc.
By the way, what micro and mu are the compatibility equivalents, does
not mean that they should have identical case mapping (Mathematical
Alphanumeric Symbols are Lu and Ll but caseless).
What the hell good on the stability of the Unicode standard, if it
excludes the possibility of using it. There is an error, Micro Sign
should not be converted to uppercase, so it should not have case
mapping at all.
Impossible to get the correct result of the text transformation
procedures, without control its by upper-layer protocols. (In fact by
the humane being only). Maybe forget about the case mappings in the
UnicodeData.txt and use only the "upper-layer protocols"?
I think no. I think all those futures may be supported by Unicode, the
best way (for keeping stability) is change the basic casing algorithm
for process the new added control characters for special casing (e.g.
1.keep ever - for symbols like SI or chemical notation, 2.title or
capital but not small - for proper name, 3.small or capital but not
title - for the service word: preposition, article), just like
"LR/RL-O/E-PDF" for special bidi-control. Who wants to be able to add
them to the text and forget about the possibility of errors in the
text transformation. Who does not want, can do without them and use
the currently existing algorithm.
This archive was generated by hypermail 2.1.5 : Fri Feb 11 2011 - 04:48:17 CST