Re: Uppercase ß is coming? (U+1E9E)

From: Kenneth Whistler (kenw@sybase.com)
Date: Thu May 03 2007 - 18:21:39 CST

  • Next message: Philippe Verdy: "RE: Optimus keyboard in the news"

    Mark said,

    > In practice, I don't think this new character need cause any particular
    > problems for searching. It can compatibly have the relation
    >
    > lowercase(capital-ß) = ß
    >
    > That means that we would make it a case-folding variant of ß,

    I agree with Mark up to this point.

    > and a
    > collation variant of ß.

    This does not follow as a consequence of that, however.

    The UnicodeData.txt entry for U+00DF LATIN SMALL LETTER SMALL S is:

    00DF;LATIN SMALL LETTER SHARP S;Ll;0;L;;;;;N;;German;;;

    In other words, it has no simple case mapping, nor does it
    have a compatibility decomposition to <s, s>. CaseFolding.txt
    does provides a full case mapping for it.

    For collation, a specific weighting is added to the DUCET,
    *not* based on UnicodeData.txt, to result in:

    00DF ; [.11AF.0020.0004.00DF][.0000.0199.0004.00DF][.11AF.0020.001F.00DF] #
    LATIN SMALL LETTER SHARP S

    The sequence of two <s> weights, plus the constructed secondary
    weight for the first <s>, is completely the result of
    deliberate introduction of this weight in the DUCET.

    The same thing would have to be done, deliberately, to
    get the UCA to weight the LATIN CAPITAL LETTER SHARP S as
    equivalent to a secondary-weighted <S, S> sequence,
    thus resulting in the expected behavior for sorting
    and searching.

    > We would still keep the uppercase mapping:
    >
    > uppercase(ß) = SS

    I agree that that would be required for stability.

    >
    > Mark
    >
    > On 5/3/07, John Hudson <john@tiro.ca> wrote:

    > > [The proposal recommends for discussion a possible compatibility
    > > decomposition to 'U+0053
    > > U+0053' to 'provide for the equivalence of the character sequences
    > > "capital ß" and "SS" in
    > > those applications that use the Normalization Form KD or KC for the
    > > detection of sameness
    > > of names etc.' How viable is this?]

    In response to John on that point, I don't think it is viable
    at all. Remember that U+00DF itself doesn't have a compatiblity
    decomposition either. The equivalence in terms of searching
    is handled, instead, by the special treatment in the DUCET
    table for UCA (and equivalently in the CTT for ISO 14651, of
    course).

    --Ken



    This archive was generated by hypermail 2.1.5 : Thu May 03 2007 - 18:24:23 CST