Re: Looking for transcription or transliteration standards latin- >arabic

From: Michael \(michka\) Kaplan (
Date: Fri Jul 09 2004 - 09:40:46 CDT

  • Next message: Otto Stolz: "Re: UTF Magic Pocket Encoders"

    From: "Peter Kirk" <>

    > But Kaplan is referring to something quite different, optionally
    > ignoring diacritics in search operations. This is indeed desirable, so
    > that a single search can match both Dvorak and Dvořák for example, and
    > so that the one doing the search does not need to remember exactly which
    > diacritics are used in the name. And it is already covered by the
    > Unicode collation algorithm and default table, in which diacritics are
    > distinguished only at the second level and so folded by a top level only
    > collation.

    (a) If this were true and it were the only need, then case folding would
    also just be "a UCA issue", yet case folding is in the document.

    (b) Not everyone uses the UCA who uses Unicode (most of the corporate
    members companies in Unicode -- including IBM -- had alternate collation
    methods that existed prior to the UCA and which to this day support more
    languages, in their databases and operating systems)

    (c) Since the operation (diacritic folding) is a valid one that
    implementations may want to do and the UCA is a UTS and thus not required
    for Unicode conformance, it is a sensible folding operation to define.

    Does diacritic folding destroy information provided by the distinctions that
    diacritcs provide? Of course it does. But then again, the same can be said
    of all foldings. This does not diminish their potential usefulness in
    specific tasks/operations.

    MichKa [MS]
    NLS Collation/Locale/Keyboard Development
    Globalization Infrastructure and Font Technologies
    Windows International Division

    This archive was generated by hypermail 2.1.5 : Fri Jul 09 2004 - 09:41:39 CDT