Re: Accented ij ligatures (was: Unicode Public Review Issues update)

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Tue Jul 01 2003 - 11:41:51 EDT

  • Next message: Markus Scherer: "simple case mappings across UTF-8 length boundaries"

    On Tuesday, July 01, 2003 4:09 PM, Pim Blokland <pblokland@planet.nl> wrote:
    > Maybe it was a bad idea to include ij as a character in Unicode at
    > all, but now it's there, there's no reason to ignore it when
    > refining the rules, to deprecate it practically.

    No, that was needed for correct Dutch support. Look at the case
    conversion of <ij> into <IJ>, even with titlecase...

    The character itself is not breakable in Dutch where it is definitely
    not a ligature, but a single character, with its own case conversion
    rule, exactly like the <ae> and <AE> letters (considered as
    ligatures or as unreakable letters depending on the language that
    use them).

    That's why <ij> and <IJ> are not canonically decomposable as
    <i, j> and <I, J> (this is just a compatibility decomposition).

    If it had only been a shortcut character mapped for compatibility
    reasons from some 8-bit encodings, it would have been normalized
    with a canonical decomposition.

    (the exception to this rule is the inclusion of Arabic ligatures which
    were clearly and always decomposable, but that could not be
    canonically decomposed because it would have required more than
    a character pair for the NFD equivalence, so they are only
    given a NFKD decomposition and their usage is strongly
    deprecated, and just included for an unnecessary roundtrip
    conversion from legacy Arabic encodings).

    -- Philippe.



    This archive was generated by hypermail 2.1.5 : Tue Jul 01 2003 - 12:24:50 EDT