Re: IDN Security, recommended character ranges blacklist

From: Neil Harris (neil@tonal.clara.co.uk)
Date: Tue Feb 15 2005 - 07:50:36 CST

  • Next message: Michael Everson: "Re: IDN Security, recommended character ranges blacklist"

    Patrick Andries wrote:

    > Peter Kirk a écrit :
    >
    >> On 15/02/2005 11:41, Neil Harris wrote:
    >>
    >>> ...
    >>>
    >>> In http://www.icann.org/committees/idn/idn-codepoint-input.htm, IANA
    >>> recommend
    >>> blacklisting the following Unicode character ranges as unusable
    >>> within IDNs:
    >>>
    >>> ...
    >>>
    >>> I would go further, and add:
    >>>
    >>> * Spacing Modifier Letters
    >>> ...
    >>> * IPA Extensions
    >>>
    >> There should not be a blanket ban on using these characters in IDNs,
    >> as some characters in both of these ranges are used as part of the
    >> regular orthography of some modern languages. For example,
    >> Azerbaijani (Latin) uses U+0259 LATIN SMALL LETTER SCHWA and
    >> (officially) U+02BC MODIFIER LETTER APOSTROPHE.
    >
    >
    > I agree : banning IPA extensions is a bad idea. I mentioned this
    > yesterday : living African languages use IPA extensions, for instance
    > U+0253 in the Pan-nigerian alphabet.
    >
    > P. A.
    >
    Okay, let's get rid of those two ranges, and only blacklist:

    * Box Drawing
    * Block Elements
    * Geometric Shapes
    * Miscellaneous Symbols
    * Dingbats
    * Byzantine Musical Symbols
    * Musical Symbols
    * Mathematical Alphanumeric Symbols
    * Letterlike Symbols
    * Number Forms
    * Arrows
    * Mathematical Operators
    * Miscellaneous Technical
    * Combining Marks for Symbols
    * Control Pictures
    * Optical Character Recognition
    * Enclosed Alphanumerics
    * Miscellaneous Mathematical Symbols-A
    * Supplemental Arrows-A
    * Supplemental Arrows-B
    * Miscellaneous Mathematical Symbols-B
    * Supplemental Mathematical Operators
    * Miscellaneous Symbols and Arrows
    * High Surrogates
    * Low Surrogates
    * Private Use Area
    * Alphabetic Presentation Forms
    * Small Form Variants
    * Halfwidth and Fullwidth Forms
    * Variation Selectors
    * Tags
    * Specials
    * Variation Selectors Supplement
    * Supplementary Private Use Area-A
    * Supplementary Private Use Area-B
    * Linear B Syllabary
    * Linear B Ideograms
    * Shavian
    * Deseret
    * Ugaritic
    * Old Italic
    * Ogham
    * Runic
    * General Punctuation

    This can be used as belt-and-braces to a whitelist solution,
    particularly for cases where a registrar might not be compliant with
    the proper whitelist guidelines.

    Remember that spoofing is a numbers game, and spoofers can be very
    ingenious. As the number of spoofable glyphs increases, so do their
    options. Similarly, as we cut down their character repertoire, their
    capability for spoofing reduces exponentially in the number of choices
    available, and makes other anti-spoofing measures far more effective.

    Are there any objections to blacklisting characters in the remaining
    ranges within IDNs, based on usage in any known living language?
    (Remembering that the spec can always be changed in future if there is
    some unexpected revival in the use of any dead-language alphabet).

    -- Neil



    This archive was generated by hypermail 2.1.5 : Tue Feb 15 2005 - 07:53:32 CST