Re: Tamil 0BB3 and 0BD7

From: Philippe Verdy (
Date: Mon Nov 10 2003 - 08:19:43 EST

  • Next message: Philippe Verdy: "Re: Hexadecimal digits?"

    From: "Kent Karlsson" <>
    > > From: "Kent Karlsson" <>
    > >
    > > > The Indic "lenght marks" should be seen as encoding mistakes.
    > >
    > > Could they be documented officially as deprecated in favor of another
    > > character, by assigning them a compatibility decomposition
    > > mapping (I mean with <compat>XXXX in the UCD)?
    > By now you should know perfectly well that they cannot.
    > The decompositions cannot be changed.

    Is it true for compatibility decomposition? When I look at the Unicode
    stability policy, I thought it only meant the canonical mappings, or the
    fact that a canonical mapping cannot be changed to a compatibility mapping
    or the reverse, and that this mapping must remain stable.

    Under point #4, we have this sentence:

        Particularly in the situation where the Unicode Standard first
        encodes less-well documented characters and scripts, the
        exact character properties and behavior initially may not be
        well known.(...)

    This is our case.

        (...)As more experience is gathered in implementing the characters,
        adjustments in the properties may become necessary. Examples
        of such properties include, but are not limited to, the following:
          * General category
          * Case mappings
          * Bidi properties
          * Compatibility decomposition tags (e.g. <font> vs. <compat>)
          * Representative glyphs

    So, as the change in AU length mark does not affect its identity,
    the compatibility decomposition tag may be added.

    May be I'm wrong here. But this does not forbid Unicode to say
    that length marks should be deprecated like some other characters.
    Of course this would require an equivalent update in the ISCII
    standard from which these characters were coded: what if ISCII
    says now that length marks are deprecated for use in a given list
    of scripts where it is used? Shouldn't the same happen to Unicode?

    Also it would be an interesting mapping for applications which will
    be quite scrupulous about effective character identity (notably in
    IDNA where it is a security issue: IDNA implementations will probably
    need to add this mapping as part of the process for NamePrep...)

    > And since these chars are part of the decompositions of actually useful
    > these "length marks" cannot be deprecated or use-discouraged.

    With compatibility mappings we don't remove any canonical distinctions, so
    the stability of normalized strings is kept (except compatibility
    decompositions, which however often removes some distinctions which are not
    essential to the character identity)...

    Deprecating a character would mean that implementations are encouraged,
    wherever possible, to treat legacy texts encoded with length marks
    identically with those coded with separate letters. But it does not
    constitute a requirement for conformance.

    This archive was generated by hypermail 2.1.5 : Mon Nov 10 2003 - 09:01:33 EST