Re: Case mappings

From: QSJN 4 UKR (
Date: Fri Feb 11 2011 - 04:42:37 CST

  • Next message: QSJN 4 UKR: "Re: Case mappings"


    >"Jukka K. Korpela" <>:
    >Converting text to uppercase is always a matter of judgment. You should not assume that such a conversion can always be made without changing or distorting the information content. In fact, uppercase-converting “ms” as an SI notation would be, in a sense, worse than uppercase-converting “µs”. The latter would produce “ΜS”, a mix of Greek and Latin letters, therefore suspicious, and definitely incorrect as an SI notation even at the character level – the SI does not use capital mu at all. But uppercase-converting “ms” produces “MS”, which looks innocent and is correct as an SI notation, though it means megasiemens and not millisecond.

    Don't you know, the Greeks use the Greek alphabet for SI notation sometimes.

     There are several different applications of the letter cases. They
    are used stylistically, for example, the using a capital or title
    letters in the headers, grammatically, when the capital letter
    identifies the beginning of the sentence, the proper name, any name in
    German, and semantically, for example, in SI units or chemical

    To support all these cases, it would be nice to use special control
    characters in the text, which would indicate where the change in the
    case is admissible and where is not. Or to use for the SI, chemical
    and mathematical notation and - for capitalization of proper names
    (???) - those characters who have no case mapping, U+1D400 etc.

    By the way, what micro and mu are the compatibility equivalents, does
    not mean that they should have identical case mapping (Mathematical
    Alphanumeric Symbols are Lu and Ll but caseless).

    What the hell good on the stability of the Unicode standard, if it
    excludes the possibility of using it. There is an error, Micro Sign
    should not be converted to uppercase, so it should not have case
    mapping at all.

    Impossible to get the correct result of the text transformation
    procedures, without control its by upper-layer protocols. (In fact by
    the humane being only). Maybe forget about the case mappings in the
    UnicodeData.txt and use only the "upper-layer protocols"?

    I think no. I think all those futures may be supported by Unicode, the
    best way (for keeping stability) is change the basic casing algorithm
    for process the new added control characters for special casing (e.g.
    1.keep ever - for symbols like SI or chemical notation, 2.title or
    capital but not small - for proper name, 3.small or capital but not
    title - for the service word: preposition, article), just like
    "LR/RL-O/E-PDF" for special bidi-control. Who wants to be able to add
    them to the text and forget about the possibility of errors in the
    text transformation. Who does not want, can do without them and use
    the currently existing algorithm.

    This archive was generated by hypermail 2.1.5 : Fri Feb 11 2011 - 04:48:17 CST