RE: A .Net Unicode Puzzle

From: Kenneth Whistler (kenw@sybase.com)
Date: Mon Mar 05 2007 - 18:10:43 CST

  • Next message: William J Poser: "RE: A .Net Unicode Puzzle"

    > What a strange idea... which would have the bad effect of creating lots of
    > ambiguities, or unpronounceable and unrecognizable words (it won't even help
    > English users).

    etc., etc.

    I don't see the point in respondents here on this list getting
    so snippy and snide about the original posting. Maybe some people have
    reasons for comparing Latin strings without their diacritics.
    Yes, maybe they could do something more sophisticated with
    an ICU collator, but then again, maybe they don't want something
    more sophisticated.

    And nobody is telling people to spell French by removing the
    accents, by the way.

    There is a much better (and less chip-on-the-shoulder) discussion
    of the .NET topic on Michael Kaplan's blog:

    http://blogs.msdn.com/michkap/archive/2005/02/19/376617.aspx

    Oh, and if anybody had bothered to track back to that discussion,
    you can see:

    A) People were aware of Latin letters with diacritics that
       don't have decompositions can't be treated this way simply
       by doing an NFD (or NFKD) decomposition.
       
    B) Nobody was proposing that Indic scripts be "vandalized" by
       stripping out combining marks. This was an exercise in
       folding Latin letters by removing accents.
       
    Incidentally, in case anybody wasn't paying attention, the draft
    UTR #30 on Character Foldings:

    http://www.unicode.org/reports/tr30/

    has talked for some time now both about "accent removal" folding
    and "diacritic removal" folding -- and even has a provisional
    data file: DiacriticFolding.txt, to assist in the latter.

    --Ken



    This archive was generated by hypermail 2.1.5 : Mon Mar 05 2007 - 18:13:02 CST