Telugu vs Kannada confusables

From: Shriramana Sharma (samjnaa@gmail.com)
Date: Wed Nov 24 2010 - 23:27:38 CST

  • Next message: CE Whitehead: "RE: Phishing and enforcing Confusables.txt"

    Hello. Here's a Telugu vs Kannada confusables list I cooked up right
    now. As this is an important security issue, I post to all the lists
    so that people may contribute. Also, some of this is probably already
    there but I'm going for completeness:

    ANUSVARA
    VISARGA
    LETTER A
    LETTER AA
    LETTER I
    LETTER II
    LETTER VOCALIC L
    LETTER E
    LETTER EE
    LETTER AI
    LETTER O
    LETTER OO
    LETTER AU
    LETTER KHA
    LETTER GA
    LETTER GHA (?)
    LETTER NGA (?)
    LETTER JA
    LETTER JHA
    LETTER NYA
    LETTER TTA
    LETTER TTHA
    LETTER DDA
    LETTER DDHA
    LETTER NNA
    LETTER TA (?)
    LETTER THA
    LETTER DA
    LETTER DHA
    LETTER NA
    LETTER PA (?)
    LETTER PHA (?)
    LETTER BA
    LETTER BHA
    LETTER MA
    LETTER YA
    LETTER RA
    LETTER RRA
    LETTER LA
    LETTER LLA
    LETTER VA
    LETTER SHA
    LETTER SSA (?)
    LETTER SA (?)
    VOWEL SIGN AA
    VOWEL SIGN I
    VOWEL SIGN U
    VOWEL SIGN UU
    VOWEL SIGN VOCALIC R
    VOWEL SIGN VOCALIC RR
    VOWEL SIGN AU
    LETTER VOCALIC LL
    VOWEL SIGN VOCALIC L
    VOWEL SIGN VOCALIC LL (?)
    DIGIT ZERO
    DIGIT ONE
    DIGIT TWO
    DIGIT FOUR
    DIGIT FIVE (?)
    DIGIT SIX
    DIGIT EIGHT (?)
    DIGIT NINE

    That makes sixty two characters in all including the ones marked ?.
    Even *without* the ones marked ? (which I did because I suspected
    others may contest these cases) it comes to fifty two.

    Now to count the characters NOT common or confusable (obviously much lesser):

    LETTER U
    LETTER UU
    LETTER VOCALIC R
    LETTER KA
    LETTER CA
    LETTER CHA
    LETTER HA
    VOWEL SIGN II
    VOWEL SIGN E
    VOWEL SIGN EE
    VOWEL SIGN AI
    VOWEL SIGN O
    VOWEL SIGN OO
    LETTER VOCALIC RR
    DIGIT THREE
    DIGIT SEVEN

    That comes to sixteen.

    I left out the LLLA of Kannada and the fractions of Telugu which are
    *not* present (as of Unicode 6.0) in the other script because
    obviously there can be no comparison on those.

    So there are at least *thrice* (or at most *four times*) as many
    confusable characters between Kannada and Telugu than there are
    NON-confusables.

    Now can you beat that! Speaking of scripts with a common origin and
    causing potential confusion in IDNs, *I* say Kannada and Telugu takes
    the cake!

    Shriramana Sharma.



    This archive was generated by hypermail 2.1.5 : Wed Nov 24 2010 - 23:29:53 CST