Re: CLDR 1.5 beta/Unicode 5.0: character fallback substitutions

From: Mark Davis (mark.davis@icu-project.org)
Date: Fri Jun 01 2007 - 11:16:44 CDT

  • Next message: Philippe Verdy: "RE: Resolution process"

    On 5/31/07, Philippe Verdy <verdy_p@wanadoo.fr> wrote:
    >
    > I see a strange sentence in the specification of the new "explicit"
    > character fallback substitutions, specified in CLDR 1.5 beta
    > "characters.xml" supplementary file. It says:
    >
    > "The recommended usage is that when a character value is not in the
    > desired
    > repertoire, the explicit substitutes from characters.xml are tested one by
    > one against the repertoire, with the first substitute wholly in the
    > repertoire being substituted for the value in the output. If no explicit
    > substitute is found, then toNFC(value) is tried; if that fails then
    > toNFKC(value) is tried."
    >
    > This definition seems to violate the current Unicode 5.0 rules, because
    > explicit fallbacks (not canonically equivalent) would take precedence over
    > NFC equivalents...
    >
    > Such definition would mean that renderers need to be changed to try
    > fallbacks BEFORE converting the string to NFC, and this complicates
    > significantly the implementation.
    >
    > I've looked at the current list of fallbacks, and in fact there is
    > currently
    > NO case where an explicit fallback comes along with a NFC fallback.
    >
    > The only significant change in those fallbacks is that there are now
    > better
    > fallbacks than NFKC compatibility equivalents (for example numerical
    > fractions have an explicit fallback with a SPACE prior to the NFKC
    > equivalent, making a better work for texts like "3<ONE HALF FRACTION>"
    > which
    > would fallback to "31/2" using NFKC, instead of the better "3 1/2" with
    > the
    > explicit fallback.
    >
    > So shouldn't this definition read as:
    >
    > "The recommended usage is that when a character value is not in the
    > desired
    > repertoire, then toNFC(value) is tried. If no NFC substitute is found,
    > then
    > the explicit substitutes from characters.xml are tested one by one against
    > the repertoire, with the first substitute wholly in the repertoire being
    > substituted for the value in the output; if that fails then toNFKC(value)
    > is
    > tried."
    >
    > Are you making this new definition for possible future fallbacks where it
    > would be better to use another newer fallback than the current NFC
    > substitutes (that can't be changed due to NFC stability)? If so, there's a
    > need to change some of the requirements for Unicode 5.0 conformance
    > (because
    > this affects the character identity and the semantics), or the proposed
    > new
    > order should be just optional.
    >
    > For now, I see no justification (after looking at the proposed list) to
    > change the order of resolution in a way that prefers breaking the
    > canonical
    > equivalence...

    While this is not a matter of Unicode 5.0 conformance, it is a good
    suggestion. Can you file as a bug?

    Mark



    This archive was generated by hypermail 2.1.5 : Fri Jun 01 2007 - 11:18:37 CDT