Re: IDN problem.... :(

From: Antoine Leca (Antoine10646@leca-marti.org)
Date: Fri Feb 11 2005 - 03:56:25 CST

  • Next message: Arcane Jill: "Re: IDN problem.... :("

    I did not read the bugzilla thread.

    On Friday, February 11th, 2005 04:48Z Murray Sargent va escriure:
    > It's the alphabetic characters of
    > Latin, Greek and Cyrillic that shouldn't be mixed, or the user may
    > suffer consequences no user should have to endure.

    I think I remember seeing a Cyrillic Q been registered, or in the tracks to
    be registered (sorry, to lazy to just check the code; while I am sure
    someone will answer this post and give it). This means surely that _this_ Q
    letter will not be a problem, one should _not_ have to use a Latin Q inside
    Cyrillic letters just to have his name written correctly (which is at the
    end the very point of IDN).

    However, it also means that linguists for the lesser used languages do NOT
    stop at script frontiers, they globalize, they DO mix characters from
    differing "alphabets" in order to acomodate the unexpected uses. Saying it
    is Unicode that should register the "new" use of the character _before_ the
    name could be registered is just going to make people unhappy against
    lengthly procedures, and also makes the pressure on Unicode and WG2 a bit
    higher, unnecessarily.

    Also, determining the frontier is not an easy job in general. Of course it
    should fairly obvious for Latin Cyrillic and Greek, but when you consider
    Japanese which mix three scripts, things are a bit different; and when one
    comes to the Indian scripts, where Devanagari signs are re-used with the
    other scritps for Sanskrit... Also consider how to deal with Coptic vs.
    Greek. All these strange cases will have to be dealt with in software; so
    first it will take various years for all the IDN libraries to have it right
    (with the piles of upset users, bug reports and upset maintainers), but in
    the meantime it would make a perfect terrain for hackers, in much the same
    way we had problems a few years ago with malformed UTF-8 strings.

    While the Greek equivalent (ραγραΙ) does not look like anywhere as
    attractive as Addison's example, I notice that the narrow characters does
    not seem outlawed (I hope I missed something here ;-)), so we also have
    paypal (looks funny but correct) :(. Neither does I see restriction about
    use of payp@l (will certainly have its share of success in countries where
    there have been a lot of hype with Internet, such as here in Spain).
    Similarly paypaɭ, or even just paypaŀ or paypał or payp⒜l.
    And of course there are example just without problem once people gets the
    correct fonts, like ᏢᎪᎩᏢᎪᏞ (Cherokee).

    As said Allison, it is just a game of cats and mice, disallowing mixed
    scripts is (would have been, really) NOT the definitive solution, it will
    just require the evils to be a little bit more clever. As the first example
    shows clearly, since there could be money behind, we can assume they _will_
    be clever.

    OTOH, highlighting punycode in the address bar appears a good idea to me.

    Also, one might have a look back at RFC 3454 (StringPrep) while discussing
    this issue. This request says among others:
    : 9.1 Stringprep-specific security considerations
    :
    : The Unicode and ISO/IEC 10646 repertoires have many characters that
    : look similar. In many cases, users of security protocols might do
    : visual matching, such as when comparing the names of trusted third
    : parties. Because it is impossible to map similar-looking characters
    : without a great deal of context such as knowing the fonts used,
    : stringprep does nothing to map similar-looking characters together
    : nor to prohibit some characters because they look like others. User
    : applications can help disambiguate some similar-looking characters by
    : showing the user when a string changes between scripts.
    [more interesting text follows].
    Another version of this particular piece is also in RFC 3491, NamePrep, a
    more direct reference for an implementor.

    Antoine



    This archive was generated by hypermail 2.1.5 : Fri Feb 11 2005 - 03:57:26 CST