Re: Case mappings

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Sun Mar 13 2011 - 15:47:00 CST

  • Next message: Doug Ewell: "Re: Case mappings"

    2011/2/11 Doug Ewell <doug@ewellic.org>:
    > QSJN 4 UKR <qsjn4ukr at gmail dot com> wrote:
    >
    >> There are several different applications of the letter cases. They
    >> are used stylistically, for example, the using a capital or title
    >> letters in the headers, grammatically, when the capital letter
    >> identifies the beginning of the sentence, the proper name, any name
    >> in German, and semantically, for example, in SI units or chemical
    >> symbols.
    >
    > This is exactly why it is inappropriate to apply case-change operations
    > indiscriminately to arbitrary snippets of text.  This is not unique to
    > SI prefixes (or units) or Unicode compatibility characters; it's not
    > even really a computer problem.  It would be just as inappropriate, as
    > Jukka pointed out, to uppercase a symbol like "ms" which consists of
    > ordinary letters, whether in Unicode or in handwriting.
    >
    >> To support all these cases, it would be nice to use special control
    >> characters in the text, which would indicate where the change in the
    >> case is admissible and where is not. Or to use for the SI, chemical
    >> and mathematical notation and - for capitalization of proper names
    >> (???) - those characters who have no case mapping, U+1D400 etc.
    >
    > Modifying all existing electronic text to include such an invisible
    > control character,
    Why « all » texts ? This was not in the proposal.

    > and requiring all users and processes to enter it
    > reliably,

    Why « all » users ? Here again not in the proposal. In fact all
    characters are encoded for an undefined number of users, possibly
    small, but not for all users. The existence of the character would be
    there for those users for whom the difference does matter.

    > and modifying all keyboards to include a key for this new
    > character, doesn't seem particularly likely at this time.

    Why modifying « all » keyboards ? It is very likely to have keyboards
    extended, possibly by users themselves or through helper tools,
    without modifying any keyboard physically or even by software in their
    driver.

    Just consider the fact of French keyboards: they don't have the
    possibility of enterning all characters that are prefered for French,
    but anyway this does not preclude the possibility to install and use
    such addons that allow entering all characters needed for French
    (notably capital letters with accents, or guillemets). Various
    possibilities have been developped and are used today, even if there's
    still no standard mapping adopted universally for every French typist.

    > Better to teach users to use common sense when applying text-transformation
    > operations like uppercasing.

    You can as well teach them how to enter the characters in the same
    situation, and then the rest of the software will VERY likely support
    the correct case mappings, for rendering or transforms.

    >> What the hell good on the stability of the Unicode standard, if it
    >> excludes the possibility of using it.
    >
    > Using a character encoding standard does require a modicum of knowledge
    > about how plain text works.

    It's definitely not a problem of stability of the standard, because
    nothing needs to be changed on the existing characters. Adding a
    combining character will not break the compatibility. Effectively new
    software updates will be needed to support the new character, but it
    is exactly the same situation as when encoding any new character or
    even a complete script.

    Unicode already has invisible characters such as the implicit
    multiplicator or invisible function application, or invisible indice
    separator, in mathematical formulas. Given the context where the
    invisible combining character would be used (such as measure units),
    it has a limited scope that brings it in the same technical domain of
    applications where such character would be used.

    Then, the new character will allow easier processing of texts (because
    even if there's a case mapping applied blindly by some software that
    ignores the new combining character, this new character will still
    remain and will still allow a renderer to display the base letter plus
    the new comb.char. with the correct expected case, even if the base
    letter has been remapped to another case. No semantic will be lost.
    And texts could still be canonicalized at any time to replace a
    combination of CAPITAL LETTER plus INVISIBLE LOWERCASE into SMALL
    LETTER plus INVISIBLE LOWERCASE.



    This archive was generated by hypermail 2.1.5 : Sun Mar 13 2011 - 15:51:49 CST