Re: Caron / Hacek?

From: John Hudson (
Date: Wed Mar 05 2003 - 14:12:30 EST

  • Next message: Andy White: "RE: Ya-phalaa"

    At 08:35 AM 3/5/2003, John Cowan wrote:

    > > Then why does UnicodeData break them down as (e.g.) 0064 030C rather than
    > > 0064 0315?
    >To keep the upper case and lower case characters in sync for decomposition,
    >they always have the same combining characters.

    Yes. There is nothing technically or grammatically incorrect about thinking
    of d' l' and t' as letters with 'carons': it is only typographically
    incorrect to represent them with the typical caron mark. The encoding of
    characters and the visual representation of characters do not always
    directly correspond.

    >For another example, G with
    >cedilla gets the cedilla on top when it's a capital, but it still decomposes
    >to the ordinary combining cedilla. These are essentially font-ligaturing

    Not quite, in that the font does not necessarily require ligature
    substitution data for characters that are encoded in Unicode in precomposed
    forms. Systems and applications should take care of canonical composition,
    not fonts.

    By the way, although Unicode calls it a cedilla, the correct form to use
    with G is the disconnected, 'under comma' form.

    John Hudson

    Tiro Typeworks
    Vancouver, BC

    It is necessary that by all means and cunning,
    the cursed owners of books should be persuaded
    to make them available to us, either by argument
    or by force. - Michael Apostolis, 1467

    This archive was generated by hypermail 2.1.5 : Wed Mar 05 2003 - 14:52:51 EST