Re: Looking for transcription or transliteration standards latin- >arabic

From: Peter Kirk (
Date: Fri Jul 09 2004 - 03:12:33 CDT

  • Next message: Michael Everson: "Re: Looking for transcription or transliteration standards latin- >arabic"

    On 09/07/2004 01:41, Michael (michka) Kaplan wrote:

    >From: "Michael Everson" <>
    >>I think it's stupid (in general) to argue for stripping a letter of
    >>diacritics. If a reader is ignorant of their meaning, that can be
    >>cured. But if they are meaningful, stripping them is just misspelling
    >>the words they belong to. Why would anyone want to do that?
    >I think its inadvisable (in general) to call things stupid merely because
    >one does not see the need. on the whole, that is a better time to ask the
    >question than to make the judgment.
    >There is actually a great deal of both European and American data in
    >programs like Microsoft Exchange and Outlook, as well as in web search) that
    >folding away diacritics as a part of giving full lists of possible matches
    >is indeed preferred by users. Now they would (also) prefer the exact matches
    >to have priority, but having additional matches without the diacritics is a
    >common request, and one that has been built into many scenarios.

    It seems to me that you two Michaels are talking at cross purposes.

    Everson was apparently referring to the practice of stripping diacritics
    from foreign words as rendered typographically, e.g. in magazines and
    presumably online texts. And I tend to agree with him (from my European
    perspective) that this is unnecessary. On the other hand, if some people
    want to do it, they should not be prevented.

    But Kaplan is referring to something quite different, optionally
    ignoring diacritics in search operations. This is indeed desirable, so
    that a single search can match both Dvorak and Dvořák for example, and
    so that the one doing the search does not need to remember exactly which
    diacritics are used in the name. And it is already covered by the
    Unicode collation algorithm and default table, in which diacritics are
    distinguished only at the second level and so folded by a top level only

    Peter Kirk (personal) (work)

    This archive was generated by hypermail 2.1.5 : Fri Jul 09 2004 - 03:13:29 CDT