Re: Merging combining classes, was: New contribution N2676

From: Philippe Verdy (
Date: Wed Nov 05 2003 - 20:46:07 EST

  • Next message: Peter Jacobi: "RE: Encoding Tamil SRI"

    From: "Peter Kirk" <>

    > It seems to me that the Unicode conformance clauses are so weak as to be
    > almost useless. An application can claim to conform to Unicode but
    > hardly do anything. A font can be sold, for example, as a Unicode Hebrew
    > font while successfully rendering only a very small part of the Hebrew
    > script. I would like to see a stronger set of conformance requirements
    > etc, so that for example an application, or a rendering system, can make
    > a claim to support Unicode version N for script X if and only is it
    > properly processes, renders etc all characters defined for script X in
    > version N according to the semantics defined in version N, and allowing
    > for canonical equivalence. Well, that's a two minute summary of an idea
    > which needs further thought. But I hope the general point comes across.
    > Without this kind of conformance guarantee we are in for a period of
    > chaos, when everyone can claim to conform to Unicode but no one has any
    > obligation to deliver anything more than the very basics.

    Is there an initiative in Israel related to the supported glyphs and
    rendering features required to support Hebrew, like it exists in Europe with
    MES subsets, and will soon be developped for Chinese?

    This is clearly a work in the area of ISO 10646 (and related ISO standards)
    which is all about defined standard subsets, rather than in Unicode which
    does not standardize or define any subset (for Unicode, the only subsets
    that exist are the set of characters assigned in a numbered version, and
    these sets are always inclusive to support backward versions).

    So for me, Unicode conformance in applications does not imply any
    requirement on the supported subset, but on standardized character
    properties and algorithms that use them. Character blocks in Unicode are not
    subsets (ISO10646 contains such numbered subsets for these blocks, but this
    is nowhere in Unicode, and these subsets are not intended to offer language
    or script coverage, but the base for discussions about exact subsets needed
    for particular languages, regions or applications).

    This archive was generated by hypermail 2.1.5 : Wed Nov 05 2003 - 21:37:38 EST