Re: IDN and Missed Normalisations

From: K.G Sulochana (sulochana@cdactvm.in)
Date: Wed May 09 2007 - 00:05:28 CDT

  • Next message: Richard Wordingham: "Case Mapping Definitions (was: Adding Lowercase Letters)"

    ----- Original Message -----
    From: "Richard Wordingham" <richard.wordingham@ntlworld.com>
    To: <unicode@unicode.org>
    Sent: Monday, May 07, 2007 8:26 PM
    Subject: IDN and Missed Normalisations

    > The present standard for International Domain Name Processing (nameprep -
    > RFC 3491 and stringprep - RFC 3454) currently operates with four steps:
    > mapping, normalisation (NFKC), prohibition and bidi checking. Mapping
    > replaces single characters by sequences, which may be empty. It is composed
    > of two elements - deletion of default ignorables, and full case-folding,
    > complicated because it is done before compatibility decomposition. (I may
    > have missed some minor wrinkles in mapping.)
    >
    > The purpose of normalisation here is to remove homographs. In general this
    > only works within a script - confusion caused by mixing scripts has to be
    > handled by other means. However, there appear to be gaps in Unicode
    > normalisation which cannot now be corrected in the standard normalisations.
    > Some of these may be genuine omissions - in other cases there may be valid
    > disputes as to whether some sequences should be equivalent. There is also a
    > normalisation problem with combining characters of class 0, partly dealt
    > with by the Unicode standard defining the 'proper' sequencing in common
    > cases.
    >
    > Who is keeping track of these omissions for the purposes of IDN? Known
    > examples include decompositions of Devanagari independent vowels (Unicode
    > does not define any such decompositions) and unligated Latin digraphs.
    // Other Indic scripts also have similar problems. In Malayalam we have five independent vowels and one vowel sign having decompositions, but not included in the normalisation charts. It is ideal to have them incuded in the Normalisation chart, but I understand that due to stabilty problems, this can not be done now. We will have to think of some other method to prevent homograph spoofing due to this. We are planning to provide additional normalisation charts to the IDN registrars. //
    > Conjuncts in Indic scripts (both Indian and non-Indian) are another
    > potential problem area.
    // This requires study of the complete glyph set of each language and preparation of variant tables //
     Solutions may range from banning combinations (not
    > currently a stringprep option) to customising Unicode normalisation (also
    > not currently a stringprep option) - formally there is little difference,
    > for the step after normalisation in the processing is prohibition.
    >
    >Sulochana

    ______________________________________
    Scanned and protected by Email scanner



    This archive was generated by hypermail 2.1.5 : Wed May 09 2007 - 00:05:28 CDT