RE: Case mapping of dotless lowercase letters

From: Kent Karlsson (kentk@cs.chalmers.se)
Date: Tue Dec 16 2003 - 11:41:27 EST

  • Next message: Kent Karlsson: "RE: Case mapping of dotless lowercase letters"

    > > Since {U+0069} is /not/ canonically equivalent to
    > {U+0131}{U+0307}, I
    > > don't see anything to stop me from registering the domain name
    > > "un{U+0131}{U+0307}code.org", for example. It /is/ in NFC,
    > after all.
    >
    > You can (or rather, you will be able to when internationalized domain
    > names become a reality). But in fact you have to use case folding

    Yes. And as it happens, dotless-i case-*folds* to (soft)dotted-i,
    so you cannot register an IDN that after "nameprep" has a dotless-i
    in it, since that name isn't correctly "nameprepped".

    This does not guard against <(soft)dotted-i, dot-above>, but for
    the registered part of a domain name, registrars are *supposed* to
    have some rules for what is allowed, and what is not (for that paticular
    registrar). E.g. the Swedish domain name registry *currently* allows
    only ASCII letters plus והצי (after "nameprep") in domain names they
    register, though this may be somewhat augmented in the future
    (to cover Sami too at least, maybe more). This kind of solution
    was driven mainly by the issue of the traditional chinese vs.
    simplified chinese problem, but that approach applies to cases
    like <dotless i, dot-above> too.

    > plus NKFC, and there is a list of forbidden characters as well.
    > See RFCs 3454 and 3491 for the exact rules.

    No letter is forbidden (though several are case-folded to the same
    letter), nor is any 'graphic' combining mark.

                    /kent k



    This archive was generated by hypermail 2.1.5 : Tue Dec 16 2003 - 12:53:49 EST